You are on page 1of 25

Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS?

Visual Intelligence Using STREAM


LLVS Vision Compiler Smoothing Event Reasoning

Term Paper: Video Stream Processing


Pradyumna Kumar
Department of Computer Science Colorado State University Fort Collins , CO - 80524 prady@cs.colostate.edu

Streaming Video Segmentation Conclusion

Abstract
Term Paper: Video Stream Processing Pradyumna Kumar

Technology is automating human life in every possible way


Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Monitoring surveillance video is not so automated An automated visual intelligent system that can reason events in a video is necessary Enormous amount of video data should be eciently streamed and processed in real time Investigates the use of a streaming database to aid in this quest for an automated real-time visual intelligent system

Streaming Video Segmentation Conclusion

Introduction
Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

The camera sees, but it does not understand The video stream is under constant human observation Issue of persistent stare A system that processes streaming eld sensor data, ag unusual activity and trigger alerts is necessary

Streaming Video Segmentation Conclusion

Introduction
Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Highly dependent on processing streaming video data quickly and eciently Video data is continuous Streaming databases handle streaming data and processes continuous queries Can they be used and/or modied to store and query high dimensional streaming video data?

Streaming Video Segmentation Conclusion

Data Stream Management System


Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Traditional DBMSs are based on human-active, DBMS-passive (HADP) model Monitoring applications require a DBMS-active, human-passive (DAHP) model
Previous values of the stream should also be stored Support a large number of triggers Data arrive asynchronously and often lost, stale, or intentionally omitted Real-time requirements with a low tolerance for stale data

Streaming Video Segmentation Conclusion

Aurora and STREAM

Visual Intelligence through Latent Geometry and Selective Guidance Using STREAM
Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

A video is a sequence of images High volume of data must be streamed Features - high level representation of the frames can be streamed instead Allows the database to become less of a storage device and more of an event reasoning device Continuous queries can be written to determine events A prototype of such system built on STREAM is suggested Shortcomings of STREAM to club with a sophisticated VI system Minds Eye System by Vision Group at CSU, funded by DARPA

Streaming Video Segmentation Conclusion

Low Level Vision System


Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Signals to Symbols problem Assume an innite long video streamed at 60 fps STREAM can be used to handle it STREAM system cannot handle image streams Unroll each frame into a data tuple Image of 1024 x 768 pixel dimension = 2359296 length tuple STREAM time stamp can be used to represent the frame number

Streaming Video Segmentation Conclusion

Low Level Vision System


Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Mean Absolute Deviation Image Select * From ImageStream[Range 10 Minutes] Where %100 = 0 All frames in time window along with MAD is streamed into a middleware Returns high level semantics of the raw frames Unrolling and rolling back the images being streamed at 60 frames per second is highly expensive

Streaming Video Segmentation Conclusion

Graph Based Vision Compiler


Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Iterative graph simplication to minimize tracker errors STREAM handles continuous semantic stream from LLVS Varying links, assume a maximum of two in or out links We should be able to set the timestamp value of a data item entering into a STREAM A loop based system with the output of STREAM being its input again after some processing

Streaming Video Segmentation Conclusion

Graph Based Vision Compiler : Gap Fill


Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Track disappearance - Gap ll All tracks that disappear suddenly without any out links is selected Select track id, Max(frame id) From TrackStream[Range 10 Minutes] Where len continuesTo=0 Group by track id; Tracks without any incoming links are also queried Tracks that start within certain time window(say 30) after a track ends are considered Select g1.track id, g2.track id From GapFillStream1[Range 10 Minutes] as g1, GapFillStream2[Range 10 Minutes] as g2 Where g2.minFrame>g1.maxFrame And g2.minFrame<g1.maxFrame+30;

Streaming Video Segmentation Conclusion

Graph Based Vision Compiler : Gap Fill


Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Histogram comparison is done on these tracks Streaming the raw histogram is of no use because STREAM is incapable of complex calculations Assume histogram comparison score for all the combinations of tracks is streamed Select g3.track id1, g3.track id2 From GapFillStream3[Range 10 Minutes] as g3, HistogramStream as h Where h.track1=g3.track id1 And h.track2=g3.track id2 And h.score>0.9; Register Stream Rule4;

Streaming Video Segmentation Conclusion

Graph Based Vision Compiler: Assumptions


Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Overriding the timestamp mechanism of STREAM so that the timestamp can be set explicitly Contradicts the fundamental principle of STREAM Feedback mechanism where the output of a STREAM is redirected as its input Pre calculating all the histogram similarity scores is very expensive and is practically infeasible Tracks, histograms and trajectory streams from LLVS are manipulated here and are forwarded to Appearance Labeling System

Streaming Video Segmentation Conclusion

Appearance Labeling
Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Assigns an appearance label to all frames of a track Ideal system assigns same label to each subsequent frame within a track Smoothing(HMM) should be performed to assign single label to a track STREAM with most dominant approach can smooth labels A running total of certainty scores for each track and label is calculated. Label with highest sum of certainty scores is considered

Streaming Video Segmentation Conclusion

Appearance Labeling: Drawbacks


Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Multiple labels may contribute to same score Certainty score varies according to the length of the track Low certainty score may be due to shorter length of track or due to the presence of many unlabeled frames. Normalization CertaintySum is a oat whereas tracklength is an integer eld STREAM doesnot support typecasting and divide operation on dierent datatypes

Streaming Video Segmentation Conclusion

Action Labeling
Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Assigns action labels to a track Multiple actions can be assigned to a single frame A track can have multiple actions over dierent time intervals Smoothing is necessary CQL sliding window syntax can be used to capture the most dominant action over a range of frames STREAM can account for multiple labels across multiple time frames

Streaming Video Segmentation Conclusion

Event Reasoning System


Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Describe a video as a series of events Some events relate directly to potential action labels Many events are complex and considers appearances, actions and trajectories STREAM can be used as an event reasoning machine Track Stream, trajectory stream, appearance and action labels are are produced at dierent rates Synchronization issues Stream should be held till all the streams are processed Slowly moving pre-processing (labeling) system

Streaming Video Segmentation Conclusion

Event Reasoning System: Enter Field of View


Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Appearance label is person Track should not have any incoming links First frame of track must appear at boundaries of the scene. It can be 0 or 1024 for a 1024*768 image Track appearing at the left must move right and vice versa (Trajectory Stream)

Streaming Video Segmentation Conclusion

Graph-based Streaming Hierarchical Video Segmentation


Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Separate the image/video into several constituent parts Process of partitioning data into groups of potential subsets that share similar characteristics Ecient way to bridge the primary video data and semantic content in video processing Use of video segmentation lags behind image segmentation A streaming hierarchical video segmentation framework that leverages ideas from data streams overcomes these limitations

Streaming Video Segmentation Conclusion

Figure: Three dierent video segmentations

An Approximation for Hierarchical Video Segmentation


Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Given video , consider an objective function or criterion E (|) to obtain the hierarchical segmentation result S by minimizing: S = arg minE (S | )
S

Hierarchical segmentation results in h layers of individual . segmentations S= {S 1 , S 2 , ..., S h } where each layer S i is a set of segments {s1 , s2 , ...} such that si sj = for all pairs of segments The streaming hierarchical video segmentation framework approximates a solution for (1) by leveraging ideas from data streams

Streaming Video Segmentation Conclusion

An Approximation for Hierarchical Video Segmentation


Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Consider stream pointer t that indexes frames in a video t can touch each frame of k times It cant alter the previously computed results Streaming video can be represented as a set of non-overlapping subsequences, = {V1 , V2 , ...., Vm } S could approximately be decomposed into {S1 , S2 , ...Sm } as in Figure 2, where Si is hierarchical segmentation of subsequence Vi .

Streaming Video Segmentation Conclusion

Figure: Framework of streaming hierarchical video segmentation

An Approximation for Hierarchical Video Segmentation


Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Enforce the data stream properties by Markov assumption Si of subsequence Vi is only conditioned on the subsequence Vi 1 and its segmentation result Si 1 S = {S1 , ..., Sm } =
m

arg min E 1 (S1 |V1 ) +


S1 ,S2 ,...Sm i =2

E 1 (Si |Vi , Si 1 , Vi 1 )

No energy function and conguration space Si is huge Assumptions from data streams algorithms: hierarchical segmentation result Si for Vi inuences the subsequence Vi +1 , but never the previous one

Streaming Video Segmentation Conclusion

An Approximation for Hierarchical Video Segmentation


Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

S = {S1 , ..., Sm } =
1

arg minE 1 (S1 |V1 ), ...,


S1

arg minE (Si |Vi , Si 1 , Vi 1 ), ..., arg minE 1 (Sm |Vm , Sm1 , Vm1 ) ,
Si Sm

Can be solved greedily Only two subsequences are stored into memory Same assumptions from data streams are applied to hierarchically segment Si s.t Si = {Si1 , Si2 , ...., Sih } with h 1 layers and Sij depends on Sij 1 , Sij1 , Sij 1 and Vi , Vi 1 .

Streaming Video Segmentation Conclusion

Figure: Sub-framework for hierarchical segmentation for Vi .

An Approximation for Hierarchical Video Segmentation


Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Si = arg minE 1 (Si |Vi , Si 1 , Vi 1 )


Si

2 arg minE (Si2 |Vi , Si1 , Si1 1 , Si 1 , Vi 1 ), ..., Si2

h arg minE 2 (Sih |Vi , Sih 1, Sih 1 1, Si 1 , Vi 1 ) Sih

Each voxel is a segement in rst layer(Si1 ) Can be solved greedily A complex conditional segmentation Si |(Vi , Si 1 , Vi 1 ) is transformed to a simpler hierarchical segmentation j 1 Sij |(Vi , Sij 1 , Sij 1 , Si 1 , Vi 1 ).

Streaming Video Segmentation Conclusion

An Approximation for Hierarchical Video Segmentation


Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

Streaming Video Segmentation Conclusion

Figure: Example video StreamGBH output with k = 10. (a) the video with frame number on top-left, (b) the 5th layer, (c) the 10th layer, (d) the 14th layer segmentations

Conclusion
Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
LLVS Vision Compiler Smoothing Event Reasoning

STREAM provide a nice structure to reason across naturally sequential data Each piece of information in the sequence adds additional insight to the sequence as a whole Streaming databases are the only way to continuously monitor long videos in real time A system that is capable of such real time processing is proposed, but with many assumptions If those assumptions are implemented into STREAM, real time visual intelligent system is possible A framework for streaming hierarchical video segmentation is proposed which is highly inspired from the data streams and its windowing capabilities

Streaming Video Segmentation Conclusion

You might also like