Professional Documents
Culture Documents
7, 1998
Translated from Denshi Joho Tsushin Gakkai Ronbunshi, Vol. J80-D-II, No. 9, September 1997, pp. 24212427
Yasuo Ariki
SUMMARY 1. Introduction
CCC0882-1666/98/070050-07
50 © 1998 Scripta Technica
It is also desired to compress frames from the view-
point of storage of news videos, since the amount of data in
each frame of a news video is tremendous. If the cut points
can be detected for compressed frames, this is convenient
in handling news video databases. From this viewpoint, we
Fig. 1. System organization.
propose a method for news video frames, where the image
is compressed by the DCT (discrete-cosine transform) used
in JPEG, and then the cut point is detected based on the
changes in the DC and AC components obtained in the DCT
[5].
In most conventional methods for cut detection, the 3. Image Data Input
frame is divided into several blocks. The DC components
of the corresponding blocks in adjacent frames are exam- Five-minute NHK news programs were recorded on
ined. If the difference exceeds some threshold, a cut point 8-mm tape for one month. As is shown in Fig. 2, the video
is detected. In other words, local changes are examined [4, news was sampled at the rate of 30 frame/s by an Indigo2
6]. However, misdetection may arise when the intensity of computer. JPEG compression of quality 75% was applied
a part or the whole of the image changes, as in the case of to each frame, and the result was stored on a hard disk as a
a camera flash or the turning on or off of a lamp. movie file in SGI format. The above processing can be
In order to cope with this problem, we propose to executed in real time, using SGI tools and dedicated hard-
form clusters of consecutive frames, based on the property ware for JPEG compression. When the image size is 320 ´
that adjacent frames are similar. The cut points can be 240 pixels, a memory capacity of about 200 MB is required
detected as frames that separate two clusters. Using this for five minutes of news.
approach, the problem of a result that is sensitive to changes
in intensity can be avoided.
A news program has an iterative structure in which a 4. Detecting Scene Cuts
newscaster in the studio introduces the content of an article;
this is followed by several scenes of the site, and then the 4.1. Existence of scene clusters
scene comes back to the studio. In the set of detected
cut-point frames, the structure appears as a loop structure In the same scene, adjacent frames are similar and the
with the studio scene as the start. Consequently, by detect- DCT components are likewise close. When the scene
ing such a loop structure, the studio scene can be identified. changes, the DCT components change greatly. In other
In this paper, news articles are extracted based on the words, it is expected that the frames in the same scene form
detected studio scenes [7]. a cluster, represented in terms of the DCT components.
In section 2 we describe the system configuration that In order to verify this idea, a preliminary experiment
extracts an article and in section 3 we discuss the input was tried in which each frame in the news video was
method for image data. Sections 4 and 5 describe the represented by the DCT components, and principal compo-
proposed cut detection together with an evaluation of the nent analysis was applied to the representation; the forma-
approach. Sections 6 and 7 discuss article extraction based tion of the cluster was represented in two dimensions. The
on cut detection. news video used in the experiment was composed from
1594 frames (about 53 s), partitioned into three scenes.
2. System Configuration
51
Figure 3 shows the result. Clusters are obviously
formed for scenes 1 and 2 in the figure. Scene 3 does not
stay within a cluster but is connected to another cluster
Scene 3¢ by a curved line. This is due to the camera work.
It is evident that the clusters of Scene 3 and Scene 3¢
correspond to the start and the end of the scene, respectively,
and that camera work is not contained in either.
Based on the above preliminary experiment, the cuts
are detected in this study by forming frame clusters instead
of by detecting interframe differences as in the conventional
method.
52
misdetection can be avoided in images containing many 5.2. Class of undetected cut points
movements, such as camera work, by increasing the weight
for the DC component instead of the AC components. From Table 2 shows the classes of 55 undetected cut points.
this perspective, the DC and the AC components are not The features and reasons for the undetected cut points can
weighted equally in cut detection. The weight is defined so be summarized as follows.
that the DC component has a larger effect on the over-
number, while the first AC component has a larger variance (A) Cut point immediately following scene with
in the cluster. rapid movement
53
(E) Cut point with zooming change
6. Extraction of Articles
point, and then the studio scene is detected. sional DCT components of frame m.
The article extraction process can be separated into Calculating the distance using Eq. (4), if there exists
three steps as in Fig. 5. Following the flow shown in the a frame n with a distance less than a certain threshold,
figure, the article extraction process is described below. frames m and n are identified as loop points. Then, frame n
is moved further, and the set of cut-point frames belonging
to the same loop point is determined. Next, cut-point frame
6.2. Loop detection
m is moved forward, and another loop point is identified. It
may happen, as is shown in Fig. 6, that a small loop point
The transition of the scenes composing the news
video can be represented as in Fig. 6. In the figure, the is formed, but a long loop as in the case of the studio scene
detected cut-point frame is indicated by a black square. is not formed.
Since similar scenes are located close together, the news
video forms loops, repeatedly returning to the studio scene. 6.3. Identification of studio scene
The proposed method detects the cut point (loop point) as
the start of the loop. Based on the loop points determined by the loop
The algorithm for loop point detection is as follows. detection process, the studio scene containing the news-
Initially, the cut-point frame m at the head of the news video caster is identified. In some special articles (such as sports
is fixed. By moving the cut-point frame n forward along the news), it may happen that scenes at the same position and
time axis, the Euclid distance d(m, n) between the frames with the same angle are repeatedly inserted, and loops are
is calculated as follows. formed with short intervals. The loop with the studio scene
containing the newscaster, however, can be separated, since
(4) it continues for a long time.
Figure 7 shows the situation. There exist two long
loops with loop point 1 as the start, as well as three short
loops with loop point 2 as the start. The length of the portion
between arrows in the figure is proportional to the number
of frames in the loop. The studio scene is the start of a loop
with long duration and should correspond to loop point 1.
In order to identify the studio scene, it suffices to examine
the average number of frames f in a loop as defined by Eq.
(5) for each loop point and to select the loop point with the
Fig. 5. Flow of news article extraction. maximum f.
54
Fig. 7. Loop points and studio scene.
(5)
studio scene. It can be seen from the figure that the studio
scene frame is clearly discriminated from the other frames.
In Eq. (5), N is the number of loops exiting from the
considered loop point, and ni is the number of frames in
each loop. 7. Evaluation Experiment for Article
Extraction
6.4. Article extraction
An evaluation experiment was performed for article
When the article is extracted based only on the cut extraction. The materials used in the experiment were five-
points obtained by cut detection as in Fig. 6, the article fails minute NHK newscasts for 30 days, and newscasts of three
to be extracted when the studio scene is not detected in the commercial broadcast programs (A, M, and Y). The news
cut detection process. Consequently, when the studio scene of the three commercial broadcasts lasts three minutes, and
is identified by loop detection, the scene is searched for all the news for ten days was recorded. The threshold in the
frames in the news video, and then the article is extracted. processing was determined based on five days of NHK
By separating the identification of the studio scene and the news. In the evaluation experiment, the identification rates
extraction of the article, the article can be extracted from of the studio scene and the article extraction rate were
various news videos. determined.
More precisely, the article is extracted as follows. The Table 3 shows the result. In the identification of the
top frame of the studio scene, identified by the method studio scene, a 100% identification rate was obtained for
described in the previous section, is extracted. By compar- the NHK news for 30 days. In extracting the articles based
ing that frame with all frames in the news video, the on the studio scene, an error of 0.8% was produced. This is
distances are calculated. In the calculation, each frame is due to the fact that there was a shift in the camera angle in
separated into blocks of 8 ´ 8 and, by calculating the the studio scene that produced a two-second delay in ex-
difference pixelwise in each block, the absolute value of the tracting the frames from the actual scene. The identification
sum is calculated. The distance is smaller for similar rate was 90% in one of the commercial broadcasts. This is
frames. Consequently, frames with small distances are due to the fact that there are several cameras in the studio
identified as the frames for the studio scene, and then the scene, which are switched from time to time. This resulted
articles are extracted.
As a preliminary experiment, the studio scenes were
specified by manual inspection, and we then tested to what
extent the system was able to determine the studio scene. Table 3. Evaluation of article extraction (%)
Figure 8 shows the result. The horizontal axis of the figure
is the frame number, and the vertical axis is the distance
from the studio scene to each frame. When the value is
small, it implies that the considered frame is close to the
55
in a failure of loop point detection and in identification of REFERENCES
the studio scene.
1. K. Mitsui, S. Shimojo, S. Nishio, and H. Miyahara.
Realization of news-on-demand system based on sce-
8. Conclusions nario database. Tech. Rep. I.E.I.C.E., DE96-2 (1996).
2. Y. Nakajima, H. Hori, T. Kano, and T. Shiobara. TV
news retrieval based on similar image search. Tech.
In this paper, DCT, which has been used in image
Rep. Image. Elect., 145, pp. 1720 (1995).
compression, is applied to the news video, and cuts are
3. A. Ando and T. Imai. Broadcast program request
detected by forming scene clusters based on the DCT
system based on speech recognition. Tech. Rep. Aud.
features obtained. The news video has a syntax structure in
Vis. Com., Inf. Proc. Soc. 10-4, pp. 2530 (1995).
which the scene moves from the studio to the site and then
4. K. Ohtsuji, Y. Tonomura, and Y. Ohniwa. Video cut
comes back to the studio. By detecting loops based on this
detection. Tech. Rep. I.E.I.C.E., IE91-116 (1991).
property, the studio scene is identified. Based on the iden-
5. E. Iwanari and Y. Ariki. Scene clustering and cut
tified studio scene, all frames of the studio scene are ex-
detection based on DCT components. Tech. Rep.
tracted, and the articles are then extracted.
I.E.I.C.E., PRU93-119 (1994).
In this study, the studio scene is identified based on
6. A. Nagasaka and Y. Tanaka. Detection of cut change
the detected cut point. By comparing the scene with all
in video image. Rep. Construction of Self-Org. Inf.
frames in the news video, the articles are extracted without
Database for Creative R&D Support, pp. 120127,
being much affected by the cut detection rate. For future
Sci. Tech. Agency (1992).
study we plan to construct a system by integrating our
7. Y. Saito and Y. Ariki. Toward news video database
results with character recognition and speech recognition
detection of news studio scene and article extraction.
so as to retrieve the news articles.
Tech. Rep. Image Elect., 95-04-04, pp. 1316 (Nov.
1995).
Acknowledgments The author is grateful for the
assistance of Miss Y. Saito and Miss A. Odagiri in the data
collection and experiments.
AUTHOR
Yasuo Ariki (member) graduated in 1974 from the Dept. Inf., Kyoto University. He completed the Masters Program in
1976 and doctoral program in 1979. In 1980 he was a research associate in the Dept. Inf. at Kyoto University; he became an
associate professor in 1990 and a professor in 1992. He earned his D.Eng. from Ryukoku University. From 1987 to 1990 he
was a visiting researcher at Edinburgh University. He is engaged in research on image processing and speech information
processing. He is a member of the Information Processing Society, the Acoustical Society of Japan, the Society of Artificial
Intelligence, the Image Electronics Society, and IEEE.
56