Term Paper: Video Stream Processing: Pradyumna Kumar

Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS?
Visual Intelligence Using STREAM

LLVS Vision Compiler Smoothing Event Reasoning
Term Paper: Video Stream Processing

Pradyumna Kumar
Department of Computer Science Colorado State University Fort Collins , CO - 80524 prady@cs.colostate.edu
Streaming Video Segmentation Conclusion
Abstract
Term Paper: Video Stream Processing Pradyumna Kumar
Technology is automating human life in every possible way

Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
Monitoring surveillance video is not so automated An automated visual intelligent system that can reason events in a video is necessary Enormous amount of video data should be eciently streamed and processed in real time Investigates the use of a streaming database to aid in this quest for an automated real-time visual intelligent system
Introduction
Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS? Visual Intelligence Using STREAM
The camera sees, but it does not understand The video stream is under constant human observation Issue of persistent stare A system that processes streaming eld sensor data, ag unusual activity and trigger alerts is necessary
Introduction
Highly dependent on processing streaming video data quickly and eciently Video data is continuous Streaming databases handle streaming data and processes continuous queries Can they be used and/or modied to store and query high dimensional streaming video data?
Data Stream Management System

Traditional DBMSs are based on human-active, DBMS-passive (HADP) model Monitoring applications require a DBMS-active, human-passive (DAHP) model
Previous values of the stream should also be stored Support a large number of triggers Data arrive asynchronously and often lost, stale, or intentionally omitted Real-time requirements with a low tolerance for stale data
Aurora and STREAM
Visual Intelligence through Latent Geometry and Selective Guidance Using STREAM
A video is a sequence of images High volume of data must be streamed Features - high level representation of the frames can be streamed instead Allows the database to become less of a storage device and more of an event reasoning device Continuous queries can be written to determine events A prototype of such system built on STREAM is suggested Shortcomings of STREAM to club with a sophisticated VI system Minds Eye System by Vision Group at CSU, funded by DARPA
Low Level Vision System

Signals to Symbols problem Assume an innite long video streamed at 60 fps STREAM can be used to handle it STREAM system cannot handle image streams Unroll each frame into a data tuple Image of 1024 x 768 pixel dimension = 2359296 length tuple STREAM time stamp can be used to represent the frame number
Low Level Vision System

Mean Absolute Deviation Image Select * From ImageStream[Range 10 Minutes] Where %100 = 0 All frames in time window along with MAD is streamed into a middleware Returns high level semantics of the raw frames Unrolling and rolling back the images being streamed at 60 frames per second is highly expensive
Graph Based Vision Compiler

Iterative graph simplication to minimize tracker errors STREAM handles continuous semantic stream from LLVS Varying links, assume a maximum of two in or out links We should be able to set the timestamp value of a data item entering into a STREAM A loop based system with the output of STREAM being its input again after some processing
Graph Based Vision Compiler : Gap Fill

Track disappearance - Gap ll All tracks that disappear suddenly without any out links is selected Select track id, Max(frame id) From TrackStream[Range 10 Minutes] Where len continuesTo=0 Group by track id; Tracks without any incoming links are also queried Tracks that start within certain time window(say 30) after a track ends are considered Select g1.track id, g2.track id From GapFillStream1[Range 10 Minutes] as g1, GapFillStream2[Range 10 Minutes] as g2 Where g2.minFrame>g1.maxFrame And g2.minFrame<g1.maxFrame+30;
Graph Based Vision Compiler : Gap Fill

Histogram comparison is done on these tracks Streaming the raw histogram is of no use because STREAM is incapable of complex calculations Assume histogram comparison score for all the combinations of tracks is streamed Select g3.track id1, g3.track id2 From GapFillStream3[Range 10 Minutes] as g3, HistogramStream as h Where h.track1=g3.track id1 And h.track2=g3.track id2 And h.score>0.9; Register Stream Rule4;
Graph Based Vision Compiler: Assumptions

Overriding the timestamp mechanism of STREAM so that the timestamp can be set explicitly Contradicts the fundamental principle of STREAM Feedback mechanism where the output of a STREAM is redirected as its input Pre calculating all the histogram similarity scores is very expensive and is practically infeasible Tracks, histograms and trajectory streams from LLVS are manipulated here and are forwarded to Appearance Labeling System
Appearance Labeling
Assigns an appearance label to all frames of a track Ideal system assigns same label to each subsequent frame within a track Smoothing(HMM) should be performed to assign single label to a track STREAM with most dominant approach can smooth labels A running total of certainty scores for each track and label is calculated. Label with highest sum of certainty scores is considered
Appearance Labeling: Drawbacks

Multiple labels may contribute to same score Certainty score varies according to the length of the track Low certainty score may be due to shorter length of track or due to the presence of many unlabeled frames. Normalization CertaintySum is a oat whereas tracklength is an integer eld STREAM doesnot support typecasting and divide operation on dierent datatypes
Action Labeling
Assigns action labels to a track Multiple actions can be assigned to a single frame A track can have multiple actions over dierent time intervals Smoothing is necessary CQL sliding window syntax can be used to capture the most dominant action over a range of frames STREAM can account for multiple labels across multiple time frames
Event Reasoning System

Describe a video as a series of events Some events relate directly to potential action labels Many events are complex and considers appearances, actions and trajectories STREAM can be used as an event reasoning machine Track Stream, trajectory stream, appearance and action labels are are produced at dierent rates Synchronization issues Stream should be held till all the streams are processed Slowly moving pre-processing (labeling) system
Event Reasoning System: Enter Field of View

Appearance label is person Track should not have any incoming links First frame of track must appear at boundaries of the scene. It can be 0 or 1024 for a 1024*768 image Track appearing at the left must move right and vice versa (Trajectory Stream)
Graph-based Streaming Hierarchical Video Segmentation

Separate the image/video into several constituent parts Process of partitioning data into groups of potential subsets that share similar characteristics Ecient way to bridge the primary video data and semantic content in video processing Use of video segmentation lags behind image segmentation A streaming hierarchical video segmentation framework that leverages ideas from data streams overcomes these limitations
Figure: Three dierent video segmentations
An Approximation for Hierarchical Video Segmentation

Given video , consider an objective function or criterion E (|) to obtain the hierarchical segmentation result S by minimizing: S = arg minE (S | )
S
Hierarchical segmentation results in h layers of individual . segmentations S= {S 1 , S 2 , ..., S h } where each layer S i is a set of segments {s1 , s2 , ...} such that si sj = for all pairs of segments The streaming hierarchical video segmentation framework approximates a solution for (1) by leveraging ideas from data streams

Consider stream pointer t that indexes frames in a video t can touch each frame of k times It cant alter the previously computed results Streaming video can be represented as a set of non-overlapping subsequences, = {V1 , V2 , ...., Vm } S could approximately be decomposed into {S1 , S2 , ...Sm } as in Figure 2, where Si is hierarchical segmentation of subsequence Vi .
Figure: Framework of streaming hierarchical video segmentation

Enforce the data stream properties by Markov assumption Si of subsequence Vi is only conditioned on the subsequence Vi 1 and its segmentation result Si 1 S = {S1 , ..., Sm } =
m
arg min E 1 (S1 |V1 ) +

S1 ,S2 ,...Sm i =2
E 1 (Si |Vi , Si 1 , Vi 1 )
No energy function and conguration space Si is huge Assumptions from data streams algorithms: hierarchical segmentation result Si for Vi inuences the subsequence Vi +1 , but never the previous one

S = {S1 , ..., Sm } =
1
arg minE 1 (S1 |V1 ), ...,

S1
arg minE (Si |Vi , Si 1 , Vi 1 ), ..., arg minE 1 (Sm |Vm , Sm1 , Vm1 ) ,
Si Sm
Can be solved greedily Only two subsequences are stored into memory Same assumptions from data streams are applied to hierarchically segment Si s.t Si = {Si1 , Si2 , ...., Sih } with h 1 layers and Sij depends on Sij 1 , Sij1 , Sij 1 and Vi , Vi 1 .
Figure: Sub-framework for hierarchical segmentation for Vi .

Si = arg minE 1 (Si |Vi , Si 1 , Vi 1 )

Si
2 arg minE (Si2 |Vi , Si1 , Si1 1 , Si 1 , Vi 1 ), ..., Si2
h arg minE 2 (Sih |Vi , Sih 1, Sih 1 1, Si 1 , Vi 1 ) Sih
Each voxel is a segement in rst layer(Si1 ) Can be solved greedily A complex conditional segmentation Si |(Vi , Si 1 , Vi 1 ) is transformed to a simpler hierarchical segmentation j 1 Sij |(Vi , Sij 1 , Sij 1 , Si 1 , Vi 1 ).

Figure: Example video StreamGBH output with k = 10. (a) the video with frame number on top-left, (b) the 5th layer, (c) the 10th layer, (d) the 14th layer segmentations
Conclusion
STREAM provide a nice structure to reason across naturally sequential data Each piece of information in the sequence adds additional insight to the sequence as a whole Streaming databases are the only way to continuously monitor long videos in real time A system that is capable of such real time processing is proposed, but with many assumptions If those assumptions are implemented into STREAM, real time visual intelligent system is possible A framework for streaming hierarchical video segmentation is proposed which is highly inspired from the data streams and its windowing capabilities

Term Paper: Video Stream Processing: Pradyumna Kumar

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Term Paper: Video Stream Processing: Pradyumna Kumar

Uploaded by

Copyright:

Available Formats

Term Paper: Video Stream Processing Pradyumna Kumar Abstract Introduction Why DSMS?

Visual Intelligence Using STREAM

Term Paper: Video Stream Processing

Streaming Video Segmentation Conclusion

Technology is automating human life in every possible way

Streaming Video Segmentation Conclusion

Streaming Video Segmentation Conclusion

Streaming Video Segmentation Conclusion

Data Stream Management System

Streaming Video Segmentation Conclusion

Aurora and STREAM

Streaming Video Segmentation Conclusion

Low Level Vision System

Streaming Video Segmentation Conclusion

Low Level Vision System

Streaming Video Segmentation Conclusion

Graph Based Vision Compiler

Streaming Video Segmentation Conclusion

Graph Based Vision Compiler : Gap Fill

Streaming Video Segmentation Conclusion

Graph Based Vision Compiler : Gap Fill

Streaming Video Segmentation Conclusion

Graph Based Vision Compiler: Assumptions

Streaming Video Segmentation Conclusion

Streaming Video Segmentation Conclusion

Appearance Labeling: Drawbacks

Streaming Video Segmentation Conclusion

Streaming Video Segmentation Conclusion

Event Reasoning System

Streaming Video Segmentation Conclusion

Event Reasoning System: Enter Field of View

Streaming Video Segmentation Conclusion

Graph-based Streaming Hierarchical Video Segmentation

Streaming Video Segmentation Conclusion

Figure: Three dierent video segmentations

An Approximation for Hierarchical Video Segmentation

Streaming Video Segmentation Conclusion

An Approximation for Hierarchical Video Segmentation

Streaming Video Segmentation Conclusion

Figure: Framework of streaming hierarchical video segmentation

An Approximation for Hierarchical Video Segmentation

arg min E 1 (S1 |V1 ) +

Streaming Video Segmentation Conclusion

An Approximation for Hierarchical Video Segmentation

arg minE 1 (S1 |V1 ), ...,

Streaming Video Segmentation Conclusion

Figure: Sub-framework for hierarchical segmentation for Vi .

An Approximation for Hierarchical Video Segmentation

Si = arg minE 1 (Si |Vi , Si 1 , Vi 1 )

2 arg minE (Si2 |Vi , Si1 , Si1 1 , Si 1 , Vi 1 ), ..., Si2

h arg minE 2 (Sih |Vi , Sih 1, Sih 1 1, Si 1 , Vi 1 ) Sih

Streaming Video Segmentation Conclusion

An Approximation for Hierarchical Video Segmentation

Streaming Video Segmentation Conclusion

Streaming Video Segmentation Conclusion

You might also like