You are on page 1of 18

Content-Based Automatic Video Genre Identification

T HESIS R ESEARCH P ROPOSAL

CONTENT-BASED
AUTOMATIC
VIDEO GENRE IDENTIFICATION
By

Faryal Shamsi

A research thesis proposal submitted to the Department of Computer Science

Sukkur Institute of Business Administration, Sukkur

in partial fulfillment of the requirements for the degree of

Master of Science (Computer Science)

Thesis Supervisor: Dr. Sher Muhammad Doudpota

Department of Computer Science

Sukkur Institute of Business Administration, Sukkur


1
Content-Based Automatic Video Genre Identification

TABLE OF CONTENTS

Abstract

Introduction

Problem Statement

Research Objective

Proposed Organization of Thesis

Review of Literature

Methodology

References

2
Content-Based Automatic Video Genre Identification

ABSTRACT

Video genre identification is crucial for efficient indexing and retrieval. The contents of video are

diverse, and its makes the genre identification a tricky job. Shot boundary detection and scene

detection serve as major building blocks for video content analysis using its visual features.

Conventional methods of shot boundary detection have some limitations, as it relies on a threshold.

The threshold value learned from a specific video genre may not produce accurate results for

another genre. Since, shot changes may be very frequent in case of an animation as compared to a

talk-show. Likewise, the similarity between the shots may be very high, in case of a sports program

as compared to a movie.

This study will propose an alternative approach to classify a video in 5 categories (i.e. Movie,

Talk-Show, Lectures, Sports or Others). The shots and scenes will contribute features for the genre

identification. These shots and scene will be detected using K-Means clustering rather than using

a threshold value.

Key Words:

Video Genre Identification – Content-Based Video Processing – Shot Boundary Detection

3
Content-Based Automatic Video Genre Identification

INTRODUCTION

With the evolution of internet and social networking websites, content sharing is becoming a

popular trend [1]. The level of facilitation provided to user by such websites leads to increase the

information overload, while organization of the contents is becoming a challenging and hectic task

[2]. The most popular form of content on social media is videos [3]. The nature of video contents

is diverse as it combines all other types of media such as text, audio, images and so on [4]. The top

ranking social networking sites like Facebook, YouTube allow users to explore billions of videos

per day [5]. Proper organization of such videos is therefore a necessary operation to ensure efficient

indexing and searching.

VIDEO GENRE IDENTIFICATION

Genre is defined as socially agreed category of content [6]. So, the term content-based automatic

video genre identification means, to recognize the category of a video on basis of its contents. The

heterogeneous nature of video contents, makes the genre identification a challenging job.

For automatic video classification, various features can be used, such as text based, audio based

and visual based [4]. Shot boundary detection and scene detection are the major building blocks

video classification using its visual features.

4
Content-Based Automatic Video Genre Identification

SHOT BOUNDARY DETECTION & SCENE DETECTION

A sequence of images combines to form a frame, while different frames collectively depict a shot,

furthermore collection of shots generate a scene. A video is nothing but a collection of such types

of scenes. Videos may have scenes captured from different cameras as shown in the section (a) of

following diagram.

5
Content-Based Automatic Video Genre Identification

Each camera generates a different sequence of frames as shown in section (b). A shot can be

defined as sequence of frames, recorded at contiguous time slice from a single camera [8]. The

aim of shot boundary detection is to automatically detect the transition of one shot to another for

temporal analysis and segmentation of a video [9].

A scene is a fragment of a video in which the shots are repetitive as illustrated in section (c).

A scene can also be defined as continuous action within an event of a video. Scene detection is the

process to automatically detect this repetitive pattern within a video [7].

6
Content-Based Automatic Video Genre Identification

PROBLEM STATEMENT

In spite of all the progress in the field of multimedia mining and contents based filtering, still there

is need of a system which can automatically understand the contents of videos. A reliable automatic

video genre identification system which can categorize any type of video, is yet to be proposed.

The existing video indexing and searching mechanisms available are fully at mercy of the

information provided the uploader. On the other hand, an uploader enjoys full autonomy while

generating and sharing any type of content. The uploader is not bound to provide necessary

information about the content so that it can be utilized for the purpose of indexing. Also, there is

no check and balance to ensure the integrity of the information provided by the uploader. For

example, an uploader has complete freedom to give any title to the video, no matter how

irrespective it is, with the actual contents of video. An uploader might give his or her own name

to a movie or can caption a talk show as a movie, in such case a user may not be able to view these

videos if his/her search string does not match with the information available with the video. So,

there must be an identification system for video genre which considers its contents rather than the

textual information provided by the uploader.

7
Content-Based Automatic Video Genre Identification

Shot detection is one of the major building blocks, used for video contents analysis. Currently shot

detection is performed by comparing the difference of two adjacent frames with a threshold [10].

Consider,

Fi ∈ S if,

Fi – Fi-1 > t

Let, F is set of frames within a video, S is the set of frames representing a shot transition and t is

the threshold value. If the difference between two consecutive shots is found to be greater than the

threshold value, then the later frame has the shot transition [10].

The major challenge here is to set the appropriate threshold value which is effective for all video

genres. If we consider a talk show or a lecture, the environment or the position of actors rarely

changes. Therefore, the threshold value learned by system might be very small. While in a movie

or drama the actor might be running, walking or dancing, which may result as a higher threshold

value to detect a transition between two shots. In such case using threshold based shot detection

approach for video genre identification may not produce accurate results. So, there must be an

alternative approach.

8
Content-Based Automatic Video Genre Identification

RESEARCH OBJECTIVES

This research work, as the title suggests, aims to propose a procedure to automatically identify the

genre of video by analyzing its visual contents. Shot boundary detection and scene detection are

used as dominant features to recognize the contents of a video. The traditional shot boundary

detection and scene detection techniques are often threshold based and may give imperfect results.

This research work also aims to resolve the issues of threshold based shot detection by using the

well-known machine learning approach of K-Means Clustering for the purpose of shot detection.

Major research objective of this thesis are:

 To propose a unique approach for shot boundary and transition detection, using K-Means

Clustering.

 To classify the video with one of five genres –

o Movie

o Talk show

o Sports

o Lecture

o Others

9
Content-Based Automatic Video Genre Identification

Proposed Organization of the Thesis

Thesis will be organized in following chapters:

Chapter 1: Introduction –

This chapter will include the description about the research domain (i.e. Video Classification) and

key terms used throughout the thesis documents. Furthermore, it will address the contribution of

the research and problems with the contemporary systems.

Chapter 2: Literature Review –

This chapter will discuss the contribution of other researcher in the current field. Here the

significance of their work in this research will be included with the fact that how this work is

different or improved than the prior studies.

Chapter 3: Methodology –

The data mining models and machine learning techniques used will be presented in this section.

The implementation and experimental details will also be the part of this chapter.

Chapter 4: Results –

The overall findings of the research work, along with significant issues like accuracy, precision

and recall will be mentioned in this chapter.

Chapter 5: Conclusion –

This chapter will give the insight about the contributions made by the research work, with its

limitations. Recommended future work to extend the study will be elucidated.

10
Content-Based Automatic Video Genre Identification

LITERATURE REVIEW

For successful video genre identification, recognizing the contents of a video is the chore

activity. Video comprehension and classification approaches can be divided into four

categories – text based, audio based and visual based and hybrid approaches [4].

The text based approaches use the viewable text within a video to understand its contents. A

movie generally has textual information such as cast and subtitles which can give some

insight about the video contents. Techniques extracting the text and TF-IDF (Term

Frequency – Inverse Document Frequency) and OCR are discussed in [11,12].

Other than the text, audio contents within a video can also be used for video classification

and genre identification. Various techniques are proposed in literature which uses audio

information, like dialogues [13], music [14,15], or even silence ZCR – Zero crossing rate [16]

video content analysis.

Among all these features, visual features are most dominant within the video contents. The

colors [17], shapes [19, 20], luminance [18] and motion [21] are some of the commonly used

visual features used to analyze the contents of video, as proposed be literature. The color and

shape information can be statistically analyzed by generating the color histogram and shape

histogram of images extracted from the frames of video [22]. MPEG motion vectors [23, 24]

are used as features to examine the motion information available in a video.

11
Content-Based Automatic Video Genre Identification

The most familiar genre identification model is the HMM – Hidden Markov Model [25]

which combines various features (e.g. color, shape and audio) and the Gaussian Mixture

Model [26], which performs the probability distribution with following formula –

Video classification and genre identification can also be performed using some hybrid

approaches like [27], which uses a three step process for genre identification and analyze

text, audio and visual features at each step.

Shot detection is also an approach as proposed by [28] used for video classification. Shot

detection is performed in two steps, first one is scoring and the other one is decision. In the

first step, a score is assigned to each frame on the basis of its similarity or dissimilarity with

the previous frame. The similarity score may check the similarity on the basis of color, edge

and luminance information available in the video contents, as proposed by [17, 19]. This

similarity check can be performed pixel by pixel but is very expensive [19], or can be

performed on whole the frame which may lead to inaccurate results [29, 30, 31]. Therefore,

a moderate approach is to divide frame in rectangular boxes and compare both frames block

by block [32, 33]. Two popular similarity parameters used for shot detection are –

1. Linear Form

2. Chi – Square

12
Content-Based Automatic Video Genre Identification

METHODOLOGY

For video genre identification will be performed in the following steps –

Decomposition Extracting Color Shot & Scene Scoring -


Video Genre
of Video into & Edge Detection Using Number of
Identification
frames Histogram K-Means Scenes Extracted

In the first step, video decomposition is performed. A Java application by using the Xuggler Api

and FFMPEG package, will decompose the video in frames. Another Java application using

Paul_image Api, will generate the color and edge histograms for each image in the second step.

In the third step the shot and scene will be detected using K-Means Clustering using Weka Api in

Java. The idea is to calculate the differences between the color histograms and edge histograms of

each image, and these values into two clusters. Then, calculate the means as the centroids of the

difference values of the clusters.

µ (color,edge) = ( ∑F ) * 1 / Ci
j

Fj ∈ Ci

Ci = min ( ||Fj - µc1||2 , ||Fj - µc2||2 )

13
Content-Based Automatic Video Genre Identification

The above algorithm will assign each frame to the cluster whose mean is nearest to the color and

image histogram difference with its previous frame. This algorithm will assign each frame to

exactly one Cluster, even if it could be assigned to both clusters. Automatically, one cluster will

have the small difference values (let this cluster is C1) and the other will have large difference

values (let this cluster is C2). If the difference between two frames belongs to C1, then will be

considered as same shot. But, if the difference between the frames belongs to C2, then it is

considered as shot transition.

In the fourth step scene detection will be performed. Video genre will be identified on the basis of

number of scenes. A lecture has generally have only one shot, whereas a talk show may have many

shots but overall only one scene. A sports video also has only one scene but will have a

considerable amount of motion. A movie or drama can have any number of shots and scenes.

14
Content-Based Automatic Video Genre Identification

REFERENCES
1. Sitaram Asur, Bernado A. Huberman. Predicting the future with Social Media.
Proceedings of IEEE/WIC/ACM International Conference on Web Intelligence and
Intelligent Agent Technologies. (2010)

2. Jiliang Tang, Yi Chang, Huan Liu. Mining Social Media with Social Theories: A Survey.
(2014). ACM SIGKDD Explorations Newsletter: Volume 15 Issue 2

3. DH Park, IY Choi, HK Kim, JK Kim. A Review and Classification of Recommender


Systems Research. 2011 International Conference on Social Science and Humanity
IPEDR vol. IACSIT Press, Singapore

4. Darin Brezeale and Diane J. Cook, Senior Member, IEEE. “Automatic Video
Classification: A Survey of the Literature”. IEEE Transactions On Systems, Man, And
Cybernetics (2007)

5. The Statistical Portal - https://www.statista.com/statistics/272014/global-social-


networks-ranked-by-number-of-users/

6. Genre – Wiki Encyclopedia. https://en.wikipedia.org/wiki/Genre

7. Costas Cotsaces, Nikos Nikolaidis, and Ioannis Pitas. Video Shot Detection and
Condensed Representation, A Review. IEEE Signal Processing Magazine. (2006)

8. Shot transistion detection – Wiki Encylopedia


https://en.wikipedia.org/wiki/Shot_transition_detection

9. P. Balasubramaniam; R Uthayakumar. Mathematical Modelling and Scientific


Computation International Conference, ICMMSC 2012, Gandhigram, Tamil Nadu,
India, (March 16-18, 2012). Springer. pp. 421–. ISBN 978-3-642-28926-2.

15
Content-Based Automatic Video Genre Identification

10. Rainer Lienhart. Comparision of Automatic Shot Boundary Detection


Algorithms. Microcomputer Research Labs, Intel Corporation, Santa Clara, CA 95052-
8819.

11. A. Hauptmann, R. Yan, Y. Qi, R. Jin, M. Christel, M. Derthick, M.Y. Chen, R. Baron,
W.-H. Lin, and T. D. Ng, “Video classification and retrieval with the informedia digital
video library system,” in Text Retrieval Conference (TREC02), 2002

12. T. Tokunaga and M. Iwayama, “Text categorization based on weighted inverse document
frequency,” Department of Computer Science, Tokyo Institute of Technology, Technical
Report 94-TR00001, 1994.

13. Bart Lehane, Noel O’Connor, Noel Murphy. “DIALOGUE SCENE DETECTION IN
MOVIES USING LOW AND MID-LEVEL VISUAL FEATURES” Centre for Digital
Video Processing Dublin City University

14. U. Srinivasan, S. Pfeiffer, S. Nepal, M. Lee, L. Gu, and S. Barrass, “A survey of mpeg-
1 audio, video and semantic analysis techniques,” Multimedia Tools and Applications,
vol. 27, no. 1, pp. 105–141, 2005.

15. G. Lu, “Indexing and retrieval of audio: A survey,” Multimedia Tools Applications, vol.
15, no. 3, pp. 269–290, 2001.

16. SM Doudpota, Sumanta Guha. “Mining Movies to Extract Song Sequences”.


MKMKDD’11 Proceedings of 11th International Workshop on Multimedia Data Mining.
(2011)

17. Z. Cernekova, C. Kotropoulos, and I. Pitas, “Video shot segmentation using singular
value decomposition,” in Proc. 2003 IEEE Int. Conf. Multimedia and Expo, Baltimore,
Maryland, July 2003, vol. 2, pp. 301–302.

18. R. Zabih, J. Miller, and K. Mai, “A feature-based algorithm for detecting and
classification production effects,” ACM Multimedia Syst., vol. 7, no. 1, pp. 119–128,
Jan.1999.

16
Content-Based Automatic Video Genre Identification

19. J.Namand A.Tewfik, “Detection of gradual transitions in video sequences using Bspline
interpolation,” IEEE Trans. Multimedia, vol. 7, no. 4, pp. 667–679, Aug.2005.

20. R. Zabih, J. Miller, and K. Mai, “A feature-based algorithm for detecting and
classification production effects,” ACM Multimedia Syst., vol. 7, no. 1, pp. 119–128,
Jan.1999

21. Bart Lehane, Noel E. O’connor, And Noel Murphy . “ACTION SEQUENCE
DETECTION IN MOTION PICTURES”. Centre for Digital Video Processing Dublin
City University

22. Jordi Mas and Gabriel F. “Shot Boundary Detection on basis of Color Histogram”.
Digital Television Center (CeTVD), La Selle School of Engineering, Ramon Llull
Univerisity. Barcelona. Spain.

23. V. Kobla, D. S. Doermann, K.-I. Lin, and C. Faloutsos, “Compresseddomain video


indexing techniques using DCT and motion vector information in MPEG video,” in
Storage and Retrieval for Image and Video Databases (SPIE), 1997, pp. 200–211.

24. H. Wang, A. Divakaran, A. Vetro, S.-F. Chang, and H. Sun, “Survey of compressed-
domain features used in audio-visual indexing and analysis,” Journal of Visual
Communication and Image Representation, vol. 14, no. 2, pp. 150–183, June 2003.

25. R. O. Duda, P. E. Hart, and D. G. “Stork, Pattern Classification”, 2nd ed. New York, NY:
John Wiley & Sons, 2001.

26. C. M. Bishop, “Pattern Recognition and Machine Learning”. New York, NY: Springer,
2006.

27. S. Fischer, R. Lienhart, and W. Effelsberg, “Automatic recognition of film genres,” in


MULTIMEDIA ’95: Proceedings of the third ACM international conference on
Multimedia, 1995.

17
Content-Based Automatic Video Genre Identification

28. N. Vasconcelos and A. Lippman, “Statistical models of video structure for content
analysis and characterization,” IEEE Transactions on Image Processing, vol. 9, no. 1,
2000.

29. Z. Cernekova, C. Kotropoulos, and I. Pitas, “Video shot segmentation using singular
value decomposition,” in Proc. 2003 IEEE Int. Conf. Multimedia and Expo, Baltimore,
Maryland, July 2003, vol. 2, pp. 301–302.

30. J. Yu and M.D. Srinath, “An efficient method for scene cut detection,” Pattern Recognit.
Lett., vol. 22, no. 13, pp. 1379–1391, Nov.2001.

31. R. Lienhart, “Reliable dissolve detection,” in Proc. SPIE, vol. 4315, pp. 219–230,
Jan.2001.

32. D. Lelescu and D. Schonfeld, “Statistical sequential analysis for real-time video scene
change detection on compressed multimedia bitstream,” IEEE Trans. Multimedia, vol. 5,
no. 1, pp. 106–117, Mar.2003.

33. A. Hanjalic, “Shot-boundary detection: Unraveled and resolved?” IEEE Trans. Circuits
Syst. Video Technol., vol. 12, no. 2, pp. 90–105, Feb.2002.

18

You might also like