You are on page 1of 18

Multimedia Data Mining Framework for Raw Video Sequences

JungHwan Oh, JeongKyu Lee, Sanjaykumar Kote, and Babitha Bandi
Department of Computer Science and Engineering University of Texas at Arlington Arlington, TX 76019-0015 U. S. A. e-mail: {oh, jelee, kote, bandi}

Abstract. We extend our previous work [1] of the general framework for video data mining to further address the issue such as how to mine video data, in other words, how to extract previously unknown knowledge and detect interesting patterns. In our previous work, we have developed how to segment the incoming raw video stream into meaningful pieces, and how to extract and represent some feature (i.e., motion) for characterizing the segmented pieces. We extend this work as follows. To extract motions, we use an accumulation of quantized pixel differences among all frames in a video segment. As a result, the accumulated motions of segment are represented as a two dimensional matrix. We can get very accurate amount of motion in a segment using this matrix. Further, we develop how to capture the location of motions occurring in a segment using the same matrix generated for the calculation of the amount. We study how to cluster those segmented pieces using the features (the amount and the location of motions) we extract by the matrix above. We investigate an algorithm to find whether a segment has normal or abnormal events by clustering and modeling normal events, which occur mostly. In addition to deciding normal or abnormal, the algorithm computes Degree of Abnormality of a segment, which represents to what extent a segment is distant to the existing segments in relation with normal events. Our experimental studies indicate that the proposed techniques are promising.

KEYWORDS: Multimedia Data Mining, Video Segmentation, Motion Extraction, Video Data Clustering



Data mining, which is defined as the process of extracting previously unknown knowledge, and detecting interesting patterns from a massive set of data, has been a very active research. As results, several commercial products and research prototypes are even available nowadays. However, most of these have focused on corporate data typically in alpha-numeric database.

The MultiMediaMiner [4–6] project has constructed many image understanding. we have developed how to segment the incoming video stream into meaningful pieces. 18] and HAL [19] projects are developing smart spaces that can monitor.e. 11] analyzing the broadcast news programs has been reported. As mentioned above. and traffic videos. car. etc. Among them. audio and video. The other example is a system [13] focusing on the echocardiogram video data management to exploit semantic querying through object state transition data modeling and indexing scheme.. where the semantic content is time sensitive and constructed by fusing data obtained from component streams. we extend our previous work [1] of the general framework for video data mining to further address the issues discussed above..2 JungHwan Oh. image. the behavior of those objects are monitored (tracked) to find any abnormal situations. and how to extract and represent some feature (i. airplane. We can find some multimedia data mining frameworks [14–16] for traffic monitoring system.) are extracted from video sequences. In our previous work. the common approach in these works is that the objects (i. 21–23] have recently captured the interest of both research and industrial worlds due to the growing availability of cheap sensors and processors at reasonable costs. how to mine them. how to extract previously unknown knowledge and detect interesting patterns. In this paper. predict and assist the activities of its occupants by using ubiquitous tools that facilitate everyday activities. person. A project [10. indexing and relevant information retrieval along with domain knowledge in the news programs. the developments of complex video surveillance systems [20] and traffic monitoring systems [15. in other words. As mentioned in the literature [14]. An example of video and audio data mining can be found in Mining Cinematic Knowledge project [9] which creates a movie mining system by examining the suitability of existing concepts in data mining to multimedia. They have developed the techniques and tools to provide news video annotation. .e. and second. and Babitha Bandi Multimedia data mining has been performed for different types of multimedia data. What are missing in these efforts are first. The SKICAT system [3] integrates techniques for image processing and data classification in order to identify ’sky objects’ captured in a very large satellite picture set. motion) for characterizing the segmented pieces. We extend this work as follows. how to index and cluster these unstructured and enormous video data for real-time processing. medical videos. and modeled by the specific domain knowledge. Sanjaykumar Kote. EasyLiving [17. and apply it to speech driven lip motion of facial animation system. An example of image data mining is CONQUEST [2] system that combines satellite data with geophysical data to discover patterns in global climate change. there have been some efforts about video data mining for movies. and the increasing safety and security concerns. then. JeongKyu Lee. Some advanced techniques cab be found in mining knowledge from spatial [7] and geographical [8] databases. indexing and mining techniques in digital media. A data mining framework in audiovisual interaction has been presented [12] to learn the synchronous pattern between two channels. 16.

In addition to deciding normal or abnormal. The experimental results are discussed in Section 4. we develop how to capture the location of motions occurring in a segment using the same matrix generated for the calculation of the amount. we use an accumulation of quantized pixel differences among all frames in a segment [1. The matrices representing motions are showing not only the amounts but also the exact locations of motions. Therefore. a same specific abnormal event can occur in many different ways. The main contributions of the proposed work can be summarized as follows. – Many researches [25–28] have tried to find abnormal events by modelling abnormal events.Multimedia Data Mining Framework for Raw Video Sequences 3 – To extract motions. As a result. Most of them define some specific abnormal event. our approach can be used for any video surveillance sequences to distinguish normal and abnormal events. How to capture the amount and the location of motions occurring in a segment. our approach uses the normal events which occur everyday and easy to obtain. Finally. To find the abnormality. In Section 2. By this way. Further. how to cluster those segmented pieces. to make the paper self-contained. we can get very accurate amount of motion in a segment. instead of comparing two consecutive frames (Figure 1(a)) which is the most common way to detect shot boundary [29–33]. we can get more accurate and richer information of motion contents of segment. unlike the others.e. – We investigate an algorithm to find whether a segment has normal or abnormal events by clustering and modelling normal events which occur the most. we . and try to detect it in video sequences. we briefly discuss the details of the technique in our previous work [1] to group the incoming frames into semantically homogeneous pieces by real time processing (we called these pieces as ‘segments’ for convenience). 24]. 24]. and it is not possible to predict and model all abnormal events. The remainder of this paper is organized as follows. 2 Incoming Video Segmentation In this section. However. the algorithm computes Degree of Abnormality (Ψ ) of a segment. – The proposed technique to compute motions is very cost-effective because an expensive computation (i. Because the motions are represented as a matrix.. we describe briefly the video segmentation technique relevant to this paper. We do not have to model any abnormal event separately. optical flow) is not necessary. – We study how to cluster these segmented pieces using the features (the amount and the location of motions) extracted above. the accumulated motions of segment are represented as a two dimensional matrix. which represents to what extent a given segment is distant to the existing segments with normal events. we give our concluding remarks in Section 5. and how to model and detect normal events are discussed in Section 3. which has been proposed in our previous work [1. To find segment boundary. comparison among segments is very efficient and scalable. Therefore.

and its color space of each frame is quantized (i. – Step. The Step. Assign a ¯ corresponding category number (Ck ) to the frame k . The solid graph in the top of Figure 2 shows the color histogram difference of background with each frame in the sequence. 21. The algorithm to decompose a video sequence into meaningful pieces (segments) is summarized as follows. We manually select a background frame using a similar approach as in [14. Compute the difference (Dk ) between the background (F B ) and each frame (F k ) as follows. and Babitha Bandi compare each frame with a background frame as shown in Figure 1(b). The differences are magnified so that segment boundaries can be found more clearly. and the Step. from 256 to 64 or 32 colors) to reduce noises (false detection of motion which is not actually motion but detected as motion). the frame comparison can be done by any technique using color histogram. 1.1 is a preprocessing by off-line processing.4 JungHwan Oh.4: Classify Dk into 10 different categories based on its value. pixel-matching or edge change ratio. Note that since this segmentation algorithm is generic. We use 10 categories . Note that the value of Dk is always between zero and one. We chose a simple pixel matching technique for illustration purpose. 34]. Since we can assume that the camera remains stationary for our application. Sanjaykumar Kote.3: Compare all the corresponding (same position of) pixels of two frames ¯ (background and each frame)..e. JeongKyu Lee.2: Each frame (F k ) arriving to the system is also quantized in the same ¯ rate used to quantize the background in the previous step.2 through 5 are performed by on-line real time processing.1: A background frame (F B ) is extracted from a given sequence as ¯ preprocessing. Fig. – Step. Frame Comparison Strategies – Step. A background frame is defined as a frame with only non-moving components. T otal number of pixels in which their colors are dif f erent (1) Dk = c×r – Step. a background frame can be a frame of the stationary components in the image. Assume that the size of frame is c × r pixels.

and to build a hierarchical structure from a sequence.3 0.Multimedia Data Mining Framework for Raw Video Sequences 0. but this value can be changed properly according to the contents of video.1 • Category 1 : 0.5 • Category 5 : 0.2 ≤ Dk < 0.6 ≤ Dk < 0.3 ≤ Dk < 0. 2. compare the category number of current frame with the previous frame.8 • Category 8 : 0.2 • Category 2 : 0.3 • Category 3 : 0. • Category 0 : Dk < 0. We consider that the lower categories contain the higher categories as shown in Figure 3. finding segment boundaries means finding category boundaries in which we find a starting .9 – Step.8 ≤ Dk < 0. #2 starts with Frame #c and ends with Frame #d. one segment A of Cat.6 Difference with Background Difference between Two Consecutive Frames 0. then it is possible that a < c < d < b.9 • Category 9 : Dk ≥ 0.4 ≤ Dk < 0.1 0 0 50 100 150 200 250 Frames 300 350 400 450 500 Fig.1 ≤ Dk < 0. To do this. compare Ck with Ck−1 .4 0. For example.5 ≤ Dk < 0. We can build a hierarchical structure from a sequence based on these categories which are not independent from each other. therefore. and the other segment B of Cat. #1 starts with Frame #a and ends with Frame #b.5 Frame Differences by Color Historam 5 0.6 • Category 6 : 0. In our hierarchical segmentation. Two Frame Comparison Strategies for illustration purpose.7 • Category 7 : 0.4 • Category 4 : 0.7 ≤ Dk < 0. The classification is stated below. In other words. a temporary table such as Table 1 is ¯ maintained.2 0.5: For real time on-line processing.

• If Ck−1 = Ck . • Else if Ck−1 < Ck . Segmentation Table Fig. 3. In other words. to cluster those segmented pieces.. SCk−1 +1 = k . then SCk = k .. Relationships (Containments) among Categories frame (Si ) and an ending frame (Ei ) for each category i. Sanjaykumar Kote. we ignore this segment since it is too short to carry any semantic content. 3.1 Motion Feature Extraction We describe how to extract and represent motions from each segment decomposed from a video sequence as discussed in the previous section. . . and represent it in a comparable form in this section. then ECk−1 = k − 1. • If the length of a segment is less than a certain threshold value (β ).. We compute Total . The following algorithm shows how to find these boundaries. so continue with the next frame.6 JungHwan Oh. We extend this technique to extract the motion from a segment. ECk−1 −1 = k − 1. we assume that the minimum length of a segment is one second. JeongKyu Lee. The ending frames of category Ck−1 through Ck + 1 are k − 1.. 35]. We developed a technique for automatic measurement of the overall motion in not only two consecutive frames but also an entire shot which is a collection of frames in our previous works [24.. if Ck−1 > Ck . in other words. and Babitha Bandi Table 1. • Else. 3 New Proposed Techniques We propose new techniques to capture the amount and the location of motions occurring in a segment. SCk −1 = k . The starting frames of category Ck through Ck−1 + 1 are k . and to model and detect normal events are discussed in this section. this value β is one second. In general. ECk +1 = k − 1. then no segment boundary occurs.

... we compute a motion feature.. – Step. – Step. . t1c ...  . trc (2) And AM MS which is a matrix whose items are averages computed as follows.  t11  AM MS =   . If they have different colors... T M and AM for a segment with n frames are computed using the following algorithm (Step 1 through 5).. tr 1  t12 t22 ..1: The color space of each frame is quantized (i. In other words.  – Step. . .. from 256 to 64 or 32 colors) to reduce unwanted noises (false detection of motion which is not actually motion but detected as motion). T MS . We assume that the frame size is c × r pixels.. .. However.. All its items are initialized with zeros. increase the matrix value (tij ) in the corresponding position by one (this value may be larger according to the other conditions). and its corresponding Total Motion (TM) and Average Motion (AM).. AM M . . T M is the sum of all items in T M M and we consider this as total motion in a segment.. T M can indicate an amount of motion in a segment. we simply use AM which is an average of T M .. AMS = i=1 j =1 tij n (4) As seen in these formulae..3 is repeated until all n frames in a shot are compared with a background frame. t11  t21 T M MS =   . ..2: An empty two dimensional matrix T M M (its size (c × r) is same as that of frame) for a segment S is created as follows. A T M of long segment with little motions can be equivalent to a T M of short segment with a lot of motions...3: Compare all the corresponding quantized pixels in the same position of each and background frames. AMS as follows... To distinguish these. tr 2 t13 t23 . T M is dependent not only on the amount of motions but also on the length of a segment.Multimedia Data Mining Framework for Raw Video Sequences 7 Motion Matrix (TMM) which is considered as the overall motion of a segment.. and represented as a two dimensional matrix. t2c   ... r c r c T MS = i=1 j =1 tij ..e. For comparison purpose among segments with different lengths (in terms of number of frames).. n t21 n t12 n t22 n tr 2 n t13 n t23 n tr3 n tr 1 n . tr 3  .. t 1c n t 2c n  (3) trc n   .4: Step.. ... The T M M .5: Using the above T M MS and AM MS . it remains the same. we also compute an Average Motion Matrix (AMM). – Step.. – Step. Otherwise.

We introduce a technique to capture locality information without using partitioning. category is utilized at the top level. An empty image with same size of T M M is created as T M M I . If m is greater than 256. the horizontal and vertical location changes. We can convert an AM M using the same way. we can convert this T M M (or AM M ) to an image which is called Total Motion Matrix Image (TMMI) for T M M (Average Motion Matrix Image (AMMI) for AM M ). we group segments into k1 clusters according . where the feature. j =1 arj ) To visualize the computed T M M (or AM M ). and Babitha Bandi 3. In the proposed technique.8 JungHwan Oh. which means that the vertical locations of two motions are same. i=1 c aic ) SRA = ( j =1 a1j j =1 a2j .3 Clustering of Segments In our clustering.. The following equations show how to compute SCA and SRA from AM MA . they are scaled up. Two SRs in Figure 4 (a) are same. These two arrays are called as Summation of Column (SC) and Summation of Row (SR) to indicate their actual meanings.. For example. otherwise. and motion of segments. Sanjaykumar Kote.e. and black pixels for the matrix value 256 which means maximum motion in a given shot. assign white pixel for the matrix value zero which means no motion.. m into a 256 gray scale image as an example.. and the corresponding value of T M M is assigned as a pixel value. in other words. Each pixel value for a T M M I can be computed as follows after it is scaled up or down if we assume that T M M I is a 256 gray scale image. Similarly. Each P ixel V alue = 256 − Corresponding M atrix V alue Figure 4 shows some visualization examples of AM M I . Let us convert a T M M with the maximum value. 3.2 Location of Motion Comparing segments only by the amount of motion (i. AM ) would not give very accurate results because it ignores the locality such that where the motions occur. r r r SCA = ( i=1 c a i1 i=1 c ai2 . which is described as follows.. Figure 4 (c) is showing the combination of two. JeongKyu Lee. SC and SR such that how these SC and SR can capture where the motions occur. m and other values are scaled down to fit into 256. the locality information of AM M can be captured by two one dimensional matrices which are the summation of column values and the summation of row values in AM M . we employ a multi-level hierarchical clustering approach to group segments in terms of category. Figure 4 (b) shows that the horizontal locations of two motions are same by SC s. But the value zero remains unchanged. The algorithm is implemented in a top-down fashion.

F ) i=1 v . ρ + 2. µj ). . 2.. Each cluster is clustered again into k2 groups based on the motion (AM ) extracted in the previous section accordingly. [36] since the algorithm is the most frequently used clustering algorithm due to its simplicity and efficiency.. we adopted K-Means algorithm and cluster validity method studied by Ngo et. . It is employed to cluster segments at each level of hierarchy independently.. . Given v d-dimensional feature vectors. . divide the d dimensions to ρ = k These subspaces are indexed by [1. [ρ + 1.. µk . associate a value fij for each jρ feature vector Fi by fij = d=(j −1)ρ Fi (d) 3. 4. 2ρ].. For this multi-level clustering. 3. In each subspace j of [(j − 1)ρ + 1.2: Classify each feature F to the cluster ps with the smallest distance. The K-Mean algorithm is implemented as follows. which are called as Bottom Feature. We will consider more features (i.. ρ]. SC and SR) for the clustering in the future. 2..e. This D is a function to measure the distance between two feature vectors and defined as D (F . . – Step.. Choose the initial cluster centroids µ1 .. (k − 1)ρ + 2. we call this feature as Top Feature... (k − 1)ρ + 3. For convenience.. [(k − 1)ρ + 1. F ) = 1 1 ( |F (i) − F (i)|k ) k Z (F . µ2 . 1.. kρ]. al.Multimedia Data Mining Framework for Raw Video Sequences 9 Fig.. ps = arg1≤j ≤k min D(F .. by µj = argFi max1<i<v fij – Step.1: The initial centroids are selected in the following way: d .... Comparisons of Locations of Motions to the categories. . jρ]..

Sanjaykumar Kote. we cluster and model the normal events which occur everyday and are easy to obtain. Furthermore. and a centroid at the bottom level represents the general motion characteristics of a sub-cluster. 3.4: If any cluster centroid changes the value in Step.. and Fi is the ith feature vector in cluster j . 2. F ) = i=1 F (i) + i=1 F (i) which is a normalizing function. In practice.3. µi ).. The above K-Mean algorithm can be used when the number of clusters k is explicitly specified. ξij = D (µi . The cluster separation measure ϕ(k ) is defined as ϕ(k ) = v (j ) (j ) 1 k k i=1 1≤v ≤k max ηi + ηj ξij j 1 where ηj = v i=1 D (Fi . To find optimal number (k ) clusters.3. the segments with normal events are clustered and modelled using the extracted features about the amount and location of motions. update cluster centroids as µj = 1 vj vj Fi i=1 (j ) where vj is the number of shots in cluster j . q .10 JungHwan Oh. k = 1 for L1 norm and k = 2 for L2 norm. and Babitha Bandi v v where Z (F . while ηj is the intra-cluster distance of cluster j . The L1 and L2 norms are two of the most frequently used distance metrics for comparing two feature vectors. In our multi-level clustering structure. . JeongKyu Lee. otherwise stop. a centroid at the top level represents the category of segments in a cluster. – Step. go to Step. The optimal number of cluster k is selected as k = min1≤k≤q ϕ(k ) In other words. the KMean algorithm is tested for k = 1. – The existing segments are clustered into k number of clusters using the technique discussed in the section 3. L1 norm performs better than L2 norm since it is more robust to outliers [37]. µj ).3: Based on the classification. In this function. We use L1 norm for our experiments. The idea is to find clusters that minimize intracluster distance while maximize inter-cluster distance. – Step.. More precisely.4 Modelling and Detecting of Normal Events As mentioned in the section 1. L1 norm is more computationally efficient and robust. to find the abnormal event. however. and the one which gives the lowest value of ϕ(k ) is chosen.2. The algorithm can be summarized as follows. . we have employed the cluster validity analysis [38]. ξij is the inter-cluster distance j of cluster i and j .

The idea is that if a segment is close enough to the segments with normal events. As seen in the equation (6). We can use the distance function defined in the Step. its ∆S . . ∆= ¯k1 ≥ ∆S . si ) is a distance between sg and si . which is represented as ∆ – If a new segment (S ) comes in.635 frames.Multimedia Data Mining Framework for Raw Video Sequences 11 – We compute a Closeness to Neighbors (∆) for a given segment (sg ) as follows. then the new segment can be normal since it is very close to the existing segments as we discussed in the beginning of this subsection. This ∆ is an average value of the distances between a number (m) of closest segments to a given segment sg in its cluster. if the value of ∆ for a new segment is less than or equal to the average of the existing segments in the corresponding cluster. – Compute ∆ of all existing segments. We used the rates of 5 and 2 frames/second as the incoming frame rates. Otherwise. there are more possibilities in which a given segment can be normal. Our test set has 111 minutes and 51 seconds of raw video taken from a hallway in a building which consists of total 17. segments in each cluster k1 .3) for the computation of D(sg .2 of the previous subsection (3. This is possible since a segment can be represented as a feature vector by the features used for the clustering in the above. we can compute whether it is normal or not as follows. we find to what extent a segment with event or events are distant to the existing segments with normal events. If it goes to the cluster k1 . then decide which cluster it goes to. si ) (5) m where D(sg . SC and SR capture the amount and the location of motions in a segment? – How do the proposed algorithms work for clustering and modelling of segments? Our test video clips were originally digitized in AVI format at 30 frames/second. si ). – How does the proposed segmentation algorithm work to group incoming frames? – How do T M (AM ). then S = N ormal. If ∆ Otherwise S = Abnormal (6) If S is abnormal. ¯k1 − ∆S ∆ Ψ =| | (7) ¯k1 ∆ In addition to determining normal or abnormal. we compute a degree of abnormality using the differences between them. D(sg . then its degree of abnormality (Ψ ) can be computed as follows. m i=1 4 Experimental Results Our experiments in this paper were designed to assess the following performance issues. and an average value of ∆s of the ¯k1 . which is greater than zero. Their resolution is 160 × 120 pixels.

These features are represented as the images (i. The segment #33 (Category #2) represents this type of content in which a person is appearing and disappearing in this case. The fourth and fifth columns of the table indicate the number of segments and their average length for each category.12 JungHwan Oh.e. the higher category segments can be hierarchical summaries for the lower category segments. people. We can see the average value (∆ of the segments in each category in the seventh column. and the accumulated number of frames up to the corresponding category. The average motion (AM ) of segments in each ¯k ) of ∆s category is shown in the sixth column..440 in the row of cat. four different hierarchical segments are partitioned in Figure 5. For example. JeongKyu Lee. 4. Fig.. The most common content of this type of video is that the objects (i. 6. #3 indicates the sum of the number of frames (the second column) from the category #6 to the category #3. The proposed segmentation algorithm discussed in section 2 was applied to our test video sequence mentioned above. The fourth and fifth columns of the table show the length (number of frames) of each segment and its category. Segmentation example Table 3 shows the overall segmentation results for our test set. vehicles. SC and SR for a number of segments in various categories.) are appearing from and disappearing into with various directions.1 Performance of Video Segmentation A simple segmentation example can be found in Figure 5 and Table 2. AM . 5. . and Babitha Bandi 4. The next two columns show Total Motion and Average Motion for each segment computed using the equation (4). As seen in this table. As results. Sanjaykumar Kote. etc. the number.2 Performance of Capturing Amount and Location of Motions and Clustering Figure 6 shows some examples of T M .e. The second and the third columns of the table represent the number of frames per each category.

We consider that these segments in the table are segmented from new incoming stream. in other words. Also. #131. Figure 6 shows a very simple example of clustering segments. the abnormality (Ψ ) for those segments can be represented as normal as we discussed in the section 3. the segments are clustered by category. we will consider SC and SR for more detail clustering in the future. The values of ∆S for the segments (#130. Therefore. The different categories have the different sizes and/or numbers of object(s). the segments in the higher categories have relatively larger or more objects. and further partitioned using a motion feature.Multimedia Data Mining Framework for Raw Video Sequences 13 Table 2. 4. since the ∆132 (=0. and how to compare these in the future. and the actual value of abnormality (Ψ ) can be computed as 1. As seen in this figure.07). #133 and #134) are smaller than the values of ∆k for their corresponding categories. On the other hand.3 Performance of Computing Abnormality A very simple example of computing abnormalities can be seen in Table 4. AM .4. However.15) is larger than ∆3 (=0. Overall Segmentation Results for Test Set T M M I and AM M I as discussed before).14 using the equation (7) as shown . As seen in this figure. Segmentation Result for Figure 5 Table 3. the average motions. the segment #132 is considered as an abnormal at the moment. represented by AM can distinguish the amount(degree) of motions in different segments. the amount and the location of motions are well-presented by these features. We will investigate the uniqueness of these SC and SR.

JeongKyu Lee.14 JungHwan Oh. Sanjaykumar Kote. and Babitha Bandi Fig. Sample Clustering Results . 6.

the size and the number of object(s). Example of Computing Abnormalities Fig. Eventually. 7. The new incoming segment #132 is different from a typical segment in the category 3 in terms of. This difference is captured by our algorithm for computing the abnormality. (a) Four Frames in Segment #132. . then this kind of segment will be detected as normal at a certain point. If more number of segments similar to this segment comes. (b) Four Frames in a Typical Segment in Category 3.Multimedia Data Mining Framework for Raw Video Sequences 15 in the table (the last column). For better illustration. this segment #132 becomes a normal segment if this type of segment is occurring frequently (because there is nothing wrong actually). Table 4. for example. Figure 7 shows that a number of frames in the segment #132 and a typical segment in the category 3 to which the the segment #132 belongs.

of the CASCON’98: Meeting of Minds. pages 27–32. 4. The extension includes how to capture the location of motions occurring in a segment. Incorporating domain knowledge with video and voice data analysis in news broadcasts. O. Weir. Canada. Shearer. August 2000. 1996. Advances in Knowledge Discovery with Data Mining. J. Spatial data mining: Progress and challenges. P. R. Fast spatio-temporal data mining of large geophysical datasets. Stolorz. Automating the analysis and cataloging of sky surveys. 1998. on KDD. pages 27–32. Farrara. we will consider the other features (objects. Multimediaminer: A system prototype for multimedia data mining. colors) extracted from segments for more sophisticated clustering and indexing. Hou. Fayyad. Tauber. Boston. and J. and J. H. Adikary. In this paper. Li.R. J. modelling and detection of normal and abnormal (interesting) events. References 1. Journal of Visual Communication and Image Representation.R. 1995. Alberta. O. how to cluster those segmented pieces. and feature (motion in our case) extraction. Han. 7. Li. pages 300–305. Han. Chee. 8. Han. of the 1998 ACM SIGMOD Conf.-N. Illumination invariance and object model in content-based image and video retrieval. Mining cinematic knowledge: Work in progress. MA. Zaiane. and J. Koperski. 9. and J. 2. Dorai. Z. August 2000. pages 581–583. In Proc. object movement pattern recognition. Boston. of International Workshop on Multimedia Data Mining (MDM/KDD’2000). E. J. D. of ACM Third International Workshop on Multimedia Data Mining (MDM/KDD2002). and Babitha Bandi 5 Concluding Remarks The examples of knowledge and patterns that we can discover and detect from a surveillance video sequence are object identification. Zaiane. J Yi. Shek. E. In Communication of ACM. 1998. U. we extend our previous work [1] about the general framework to perform the fundamental tasks for video data mining which are temporal segmentation of video sequences. In Proc. of Int’l Conf. Venkatesh. Mining knowledge in geographical data. Multimedia data mining framework for raw video sequences. Koperski and J. pages 46–53.-N. Wijesekera and D. Muntz. pages 98–103. 3. and how to find whether a segment has normal or abnormal events. J.-N Li. K. Nakamura. and Z. K. pages 471–493. S. C. 1996. Montreal. spatio-temporal relations of objects. In Proc. Mesrobian. In Proc. Chiang. In the future study. JeongKyu Lee. 1998. Z. In Proc. In Proc. Djorgovski. . S. Z.. Han. 1998. Mining multimedia data. In Proc.R. MA. 5. Mechoso. and S. Our experimental results are showing that the proposed techniques are performing the desired tasks. and event pattern recognition. Sanjaykumar Kote. 6. of International Workshop on Multimedia Data Mining (MDM/KDD’2000). C. 10. Zaiane. JungHwan Oh and Babitha Bandi. K Ng. Canada. July 2002. and N. Barbara. of SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery (DMKD’96). O. S. Edmonton.16 JungHwan Oh. Santos. Chien. K.

Norwell. of DARPA/NIST Workshop on Smart Spaces. Y. 2001. pages 3–10. Strickrott. Chen. and D. Foresti and F. Automatic distinction of camera and objects motions in video sequences. San Francisco. The future of human-computer interaction on how I learned to stop worrying and love my intelligent room. Piccardi. San Francisco. Morellas. B. Proceedings of The IEEE. Robust multiple car tracking with occlusion reasoning. and G. Krumm. March 1999. 1994. of International Conference on Systems. 14. 1(2):119–130. Pumrin. and S. 17. Mining audio/visual database for speech driven face animation. August 2001. CA. J. and P. pages 966–972. pages 84–93. Matsushita. Wang. Urban surveillance systems: From the laboratory to the commercial world. IEEE Intelligent Systems. pages 2638–2643. 1999. Weber. Stockholm. and I. In Proc. San Francisco. of IEEE International Conference on Multimedia and Expo (ICME 2002). Malik. J. Meyers. Miao. 15. Roli. 18. of International Workshop on Multimedia Data Mining (MDM/KDD’2001). Multi-camera multi-person tacking for easyliving. Koller. of International Workshop on Multimedia Data Mining (MDM/KDD’2001). The new easyliving project at microsoft research. Coen. I. C. 2001. Gao. Regazzoni. Y. Z. V. W. Harris. Technol.K. Semantic content-based retrieval in a video database. S. A distributed surveillance system for detection of abandoned objects in unmanned railway environments. 23.Multimedia Data Mining Framework for Raw Video Sequences 17 11. Jiang. 13. Learning and classification of suspicious events for advanced visual-based surveillance. In Proc. of 3rd IEEE International Workshop on Visual Surveillance. Hale. 12th National Conference on Artificial Intelligence (AAAI’94). 2000. P. J. In To appear in Proc. S. 19. Seattle. Huang. 1998. Pavlidis. 22. D. IEEE Transactions on Intelligent Transportation Systems. In Multimedia Video-based Surveillance Systems: Requirements. 26. Shafer. and J. 2000. Automatic symbolic traffic scene analysis using belief networks. August 2001. M. Sethi. In Proc. K. 2002. M. B. 21. WA. Man and Cybernetics. Cathey. Meyers. Oct. Shafer. In Proc. V. Ikeuchi. M. D. Brumitt. and D. Tsiamyrtzis. 12. Majumdar. Sakauchi. 16. Kulesh. JungHwan Oh and Praveen Sankuratri. June 2000. Czerwinski. D. pages 703–708. pages 50–57. Dailey. P. R. and M. of European Conference on Computer Vision. MA: Kluwer. The perseus project: Creating personalized multimedia news portal. Shyu. CA. S. Lausanne. Brumitt. Issues and Solutions. 1994. J. Mello. of AAAI. IEEE Transaction on Veh. Petrushin. Japan. Robbins. Kamijo. 25. S. J.K. V. Image analysis and rule-based reasoning for a traffic monitoring system. In IEEE Intenational Conference on Intelligent Tansportation Systems..S. pages 78–86. Singh and A. Harp. Koller. pages 127–130. 20. June 2000. In Proc. Sacchi and C. CA. Zhang. IEEE Transactions on Intelligent Transportation Systems. 2000. Ogasawara. and S. Sept. In Proc. pages 189–196. of International Workshop on Multimedia Data Mining (MDM/KDD’2001). August 2001. . Cucchiara. B. and S. Krumm. Sweden. Aug. 49:1355–1367. G. M. Tokyo. An algorithm to estimate mean traffic speed using uncalibrated cameras. Switzerland. 1(2):98–107. In Proc. 14(2):8–10. In Proc. and J. B. Chen. pages 31–37. 89(10):1478–1497. Traffic monitoring and accident detection at intersections.L. T. Multimedia data mining for traffic video sequences. Malik. F. C. M. 24.

pages 415–426. Sanjaykumar Kote. Freer. JeongKyu Lee. Chevrier. An efficient and cost-effective technique for browsing and indexing large video databases. Han and I. Zhang. pages 109–113. Monitoring motorway infrastructures for detection of dangerous events. Kweon. In SPIE Conf. on Storage and Retrieval for Media Databases 2001. Canada. Goryashko. of SPIE conf. K. San Jose. L. 38. Jain. Yang. where. LA. On clustering and retrieval of video shots. Security and Detection. Video shot grouping using bestfirst model merging. Hua. Pani. Oct. on Multimedia Computing and Networking 2000. San Jose. of 2000 ACM SIGMOD Intl. Automatic intruder detection incorporating intelligent scene monitoring with video surveillance. In Proc. Kien A. and technology (CISST’02). CA. Shot detection combining bayesian and structural information. JungHwan Oh and Tummala Chowdary. I. . Wang. 1988. and Babitha Bandi 27. Zhang.who. Robust Regression and Outlier Detection. Prentice Hall. Hua and JungHwan Oh. pages 51–60.J. NV. In Proc. C. pages 385–387. In IEEE Third Intenational Conference on Face and Gesture Recognition.J. and Ning Liang. when. what: A real-time system for detecting and tracking people. Las Vegas. Conf. June 2002.18 JungHwan Oh. 29. 2001. pages 222–227. Dallas. Algorithm for Clustering Data. Detecting video shot boundaries up to 16 times faster. Beggs. 1999. In Proc. J. Rousseeuw and A. Jan. Haritaoglu. ECOS’97. 2001. 31. JungHwan Oh. In Proc. and A. of IEEE Int. System. Qi. Japan. and H. Foresti and B. 30. 37. A content-based scene change detection and classification technique using background tracking. Pong. G. pages 262–269. T. 1997. Nara.S. A. In Proc. John Wiley and Sons. Oct.L. H. Leroy. pages 509–516. 1987. Kien A. on Management of Data. S. W. 33. In The 8th ACM International Multimedia Conference (ACM Multimedia 2000). Zhao. Jan. In Proc. W4 . of The 2002 International Conference on Imaging Science. Ngo. CA. of SPIE conf. P. Eur. 34. JungHwan Oh and Kien A. May 2000.W. In Proc. Jan. S. CA. TX. B. 1998. 36. Davis. pages 1144–1147. CA. 35. 2000. F.A. and L. on Storage and Retrieval for Media Databases 2001. D. Conf.J.C. 32. M. Conf. Harwood. Fernandez-Canque. 28. pages 254–265. Image Analysis and Processing. of ACM Multimedia 2001. Hua. 2001. 2000. Ottawa. San Jose. and H. Y. An efficient technique for measuring of various motions in video sequences.