Professional Documents
Culture Documents
The patches analysed in this tutorial regard the elaboration of image sequences with basic
background subtraction techniques. These video elaboration methods can be applied to a great
plethora of fields, for example video-surveillance, interactive multimedia systems, musem and
entertainment applications: using a video-camera system we can immediately discern details in
our image, pointing out moving objects and following their trajectory to eventually interpret their
activities.
Tests:
To test these patches the following sample videos were used: TestBS.avi, TestBS1.avi and
TestBS2.avi.
The “Video File Reader” block
As we are working with video feeds, all our patches will require a block that opens a video file
(be it AVI or MPEG or what else) and inputs it into our patch. This is done by the Video File
Reader block (ImageInputVideo File Reader).
In EyesWeb each block has a set of parameters to determine for it to work in the desiderd way,
so the first thing we must do is modify said parameters upon adding the block to our work-space.
Player Status indicates what status the block starts at: this is
to be set at “Stopped” and the check-ox near to the “Play”
input must be marked so that it may receive an external
input on when it should start broadcasting the video. In the
more basic patches this might seem superfluous, as the
external input comes from a Bang Generator
(BangInputBang Generator) that starts off at the
beginning of the patch, re-creating the same conditions that
we have just modified. But as we proced to more advanced
pathces, we will need for different blocks to start at the
same time and they will all be coordinated by the same
Bang Generator, so must begin their execution in their
“off”-mode.
Background Subtraction
It enables the user to extract moving pixels from a video source, eliminating any static elements
(e.g. background), which are extrapolated as a fixed image at the beginning of the feed (thus the
video must start with a capture of the empty background for this patch to work)
What we need to do is extract a frame containing an empty background (i.e. without any
“intrusions”) and subtract it from the grey-scale image we just inputed: this is done by using the
“Snapshot” block (TopologySnapshot) which will memorise and store the first frame (in this
case) in the video (remember when we said that the feed should start with the empty background
for this patch to work?).
Next we can subtract the background from the grey-scale feed with the use of an “Arithmetics”
block (OperationsArithmetic), setting the operation type parameter to “absolute difference”.
All that is left to do now is visualise the result through a “Display” block
(ImageOutputDisplay).
The main advantage of this patch is the lack of manual intervention on behalf of the user: no
parameters are to be set or changed during runtime.
On the other hand, the patch is terribly hindered by possible lack of contrast between the subject
and the background, confusing one’s pixels with the other’s and creating artefacts.
Tests:
To make the tests consistent in pointing out this patch’s merits and flaws, some slight
modifications were done to the original model:
The changes were necessary to ensure that the background frame we are using for the process is
free from any kind of disturbance, such as moving objects or light changes: what happens is that
the added “Video Reader” block receives a completely different video-input, specifically a static
recording of the background, thus (hopefully) avoiding any kind of artefact.
The main advantage that this patch presents the user with is detail: of all the ones presented in
this brief tutorial, only “Background Subtraction” presents a final output that is rich in
particulars. This means that not only the user will be able to examine things such as clothing,
physiognomy, skin tonality, etc.. but he will also perceive the final feed as three-dimensional,
clearly understanding where the extrapolated subject is.
On the other-hand, as was already mentioned, “Background Subtraction” suffers terribly from
changes in lighting and scarce contrast between background and subject.
Any sort of light change comes out as a bright spot on the background, while shadows created by
moving objects remain in the output and ruin the final result.
Furthermore, if trying to extract a dark subject from a dark background, the output will be
transparent and with an indefinite contour.
In figure 1 we can see the effects of changing lights and shadows on the output, while comparing
the image with the one presented in figure 2 the difference in definition between low-contrast
feeds and high-contrast ones is quite clear.
Figure 1
Figure 2
In both images we can notice a white area on the left-hand side: this is due to an error in
choosing the background feed (figure 3), in which the floor had been covered with sheets of
white paper (which wasn’t done for the other feeds).
Figure 3
Simple Threshold
This patch uses a variable threshold to extract a silhouette of the moving subject from the input
video. The algorithm is very simple to understand, though it proves to be faulty when the
threshold is either too high or too low and when subject and background have similar tonalities.
The first three components of the patch are the ones already described in the previous paragraph,
so we will not repeat ourselves. This time, instead of extracting a background image and using
the Arithmetics block, the patch inputs the grey-scaled image in a “Threshold” block
(OperationsThreshold Operation (int)) with a variable threshold parameter (which can be set
through an “Int Generator” item (NumericInputInt Generator) and is visualised through a
“Display” item (MathNumericScalarGenericOutputDisplay) (actually, it’s two
threshold parameters, one for the lower bondary and one for the upper one, though both are set to
the same value to create a binarised output)
What happens is that all pixels that have a grey-scale value higher than the set threshold will be
converted to white (i.e. their value will become 255), while all those that are lower will be
converted to black.
Tests:
The main disadvantage encountered while using this patch is the absolute absence of depth: the
output feed results completely flat!
Furthermore there is a great loss in details (e.g. crossed arms will not be distinguishable and will
result uniformly black), particularly if examining a dark subject.
Lighting changes also create artefacts and require the user to re-define his threshold settings.
The patch also requires the user to be an active part in the process, setting the threshold to the
right level to extract a correct silhouette: too low a threshold and not enough pixels will be
considered, too high and the subject will tend to be confused with the background.
As we can see from the following images, according to different threshold values we have
different percentages of spurious pixels (those bothersome black dots that shouldn’t be there).
Figure 4 (threshold 190) denotes a high quantity of spurious pixels and white areas near light
sources or where the background is illuminated by the frontal lights.
Figure 4
In figure 5 (threshold 190) we notice the subject disappearing due to the threshold level being
too high: both the subject and the background appear as a semi-uniform black area.
Figure 5
Figure 6 (threshold 130), shows what happens when reducing the threshold value: the
background slowly disappears, the silhouette is more precise and changes in light become less
bothersome.
Figure 6
Background Subtraction with Multiple Thresholds
The method has an edge on both its predecessors as the use of multiple thresholds eliminates the
need of user intervention during the process to obtain background removal (that was one of the
flaws in the Simple Threshold) and retains a good amount of detail while enhancing the contrast
of the image (in the Background Subtraction the image was grey-scale and thus less tidy)
All of this is obtained through the use of the Background Subtraction with Multiple Thresholds
block (Imaging->Operations->BgndSubMultThresh), which allows the process to apply different
levels of thresholds to different areas of the feed according to their lighting. It is defined by three
parameters: the number of Threshold Levels (set to 35), the minimum level (87) and the
maximum (194).
First off, the “persistent white area” problem that appeared in the Background Subtraction patch
also appears here, but as was pointed out before it’s jus a matter of incongruity between the feed
we want to elaborate and the background feed. (figure 7)
Secondly, the patch suffers from light changes, particularly if the brightness grows in time,
resulting in white areas on the screen, that might hide the subject (similarly to what happened in
the Simple Threshold patch, when the threshold level was too high. (figure 7 and 8)
Figure 7
Figure 8
All in all the patch proved to be a failure: not being able to cope with brightness changes, when
the subject is illuminated directly with bright light his silhouette disappears, thus rendering the
method useless in areas like video-surveillance.
(NOTE: the Multiple Threshold block used is the legacy version belongng to the EyesWeb 3.2
library and while this tutorial is being written (January 2007) it is not as yet present in the 4.5
distribution)
Simple Frame Differencing
The idea behind this patch is a simple differencing between two subsequent frames of the same
feed, which results in a silhouette of any object that moved between frames.
The patch begins exactly like every other one presented untill now: a Bang Generator and a
Video File Reader
A “Delay” block (TopologyDelay) is then used to capture a frame in input and delay its output,
so that we may then subtract it from the subsequent frame with the already known Arithmetics
block.
This is followed by a threshold block, which cleans up the resulting image binarising it, thus
making it easier to interpret. As usual the threshold value is controlled by the user through an Int
Generator, although we have set it to a default value of 30.
Frame differencing results advantageous as it responds very quickly to changes in lighting and
camera motion, extracting only objects that are actually moving (objects that stop disappear from
the screen until they start moving again).
However, only the edge of the silhouette is extracted from the feed and there aren’t enough
references to perceive if the object is moving towards or away from the camera. Furthermore,
rapid changes in light will cause major variations in the overall lighting, creating shadows and
reflections and the patch cannot discern them from real moving objects, resulting in a very
confusing image.
Tests:
In Figure 9 we can see how the application of a low threshold (5) can produce great quantities of
spurious pixels and how the subject’s shadow is visible in the background, resulting in an
inappropriate output.
Figure 9
In Figure 10 a higher threshold value (20) results in the disappearance of the spurious pixels,
although the silhouette of the shadow remains in the background.
Figure 10
In Figure 11 the threshold value is raised even more (50) and as we can see the shadow’s
silhouette completely disappears (although this is not always true).
Figure 11
Adaptive Background Subtraction
The Adaptive Background Subtraction patch responds to changes in lighting better than the
others. Being a hybrid of the Threshold, Simple Frame Differencing and Background Subtraction
we have a distinction between moving objects (well segmented, but with a slight pixel trail) and
fixed objects (that slowly fade in the background).
Where I(t) is the current image, B(t) is the result of the above expression and λ is the threshold
value. α is a parameter that can be changed at runtime and can only have two values: 0 and 1.
If we examine the patch we can clearly see that with α=1 the patch represents a Simple Frame
Differencing: B(t) is equal to I(t), thus our output M(t) becomes Threshold {abs[I(t)-I(t-1)], λ},
exactly as in the Simple Frame Differencing patch. On the other hand, if α=0 it becomes a
simple Background Subtraction, with the only difference that the “background” is the last frame
processed while α=1 and that the feed is binarised and not in grey-scale, thus losing most of the
methods advantages, while if the patch starts with α=0 the patch behaves like a Simple
Threshold (the “background” subtracted is a blank screen).
Values for α>1 weren’t considered in the elaboration, as this would only result in a brightening
of the image.
All this might seem difficult to accomplish, though after a careful study of the patch you will
realise that it’s not that hard to understand.
First off we need to generate B(t)= α*I(t) + (1-α)*B(t-1) and its components.
B(t-1) is withheld from the previous cycle through a Queue block (TopologyQueue) and sent
to another Scalar Arithmetic Operator to finish the algorithm (which shouldn’t need any
explaining, as it is exactly like all the other we have seen this far).
This is valid until α=1. As soon as it changes to 0 the whole configuration must change to that of
a Background Subtraction.
The background is stored through a Snapshot block, that receiving α as a Load parameter (i.e. if
the Load parameter is =! 0 the block will do it’s job, stopping if it ever becomes =0) constantly
memorises B(t-1). As soon as α=0 the Snapshot stops memorising and stores the last frame to be
used in lieu of the background input feed that we had in the Background Subtraction patch.
To finish off we put an Input Selector block (TopologyInput Selector) piloted by α, that will
skip between the Snapshot and the Queue blocks.
Tests:
We have carried out tests and have obtained the following results.
In Figure 12 (α=1 and threshold=5) we have a high amount of spurious pixels (10,6%), too high
to allow us to precisely discern our subject’s silhouette. We also have the residual image of the
subject’s shadow.
Figure 12
In Figure 13 (α=1 threshold=20) the amount of spurious pixels is drastically reduced (0.59%),
although the shadow’s contour remains.
Figure 13
Figure 14 (α=1 threshold=60) we can see an optimised elaboration: there is no sign of the
shadow and the amount of spurious pixels is so small it can be completely ignored. Sadly, we
still do not have a perfect reconstruction of our subject’s silhouette.
Figure 14
As we can see in Figure 15 some parts of the contour are missing. This is due
to the Frame Differencing technique that will delete pixels that aren’t moving
(in this case the foot that’s standing on the floor)
Figure 15
In Figure 16 (α=1 threshold=35) there are too many spurious pixels (0.79%) and too many of
the subject’s details are lost in the background.
Figure 16
In Figure 17 (α=0 threshold=160) we try to amend to the previous image’s problems by raising
the threshold level obtaining a high-contrast silhouette, while unfortunately also creating many
dark areas in the background which might cause our subject to disappear.
Figure 17
Figure 18
In Figure 19 (α=0 threshold=110) we find that for this particular threshold level the spurious
pixels are only 8.83% and that most of them are far away from our subject’s silhouette,
permitting us to easily distinguish it.
Figure 19
In Figure 20 we start the patch at α=0 and subsequently changed it to 1, obtaining a Background
Subtraction. The last silhouette that was extrapolated from the image is stored in the background
and persists in the output.
Figure 20
In Figure 21 we try to eliminate this extra silhouette by raising the threshold level, but notice
that by doing so we also lose the subject’s.
Figure 21
Considering the results that were obtained during this test we can come to the conclusion that
Adaptive Background Subtraction can be used in video-surveillance with a sufficient degree of
success as long as
a) we set an appropriate threshold level according to the chosen value of α. (60 for α=1 and
110 for α=0)
b) we avoid changing α from 1 to 0 during run-time.
To make a long story short: this patch visualises the output’s motion history.
To this we have to add B(t), which is very simply a residue of all previous frames, their
brightness diminished by a γ factor.
B(t) is obtained by processing the output through a Time Delay block (thus transforming our H(t)
in H(t-1) ) and then uniformly subtracting from each of its pixels γ to diminish the image’s
brightness, obtaining the aforementioned “fading trail of pixels”.
γ is the rate at which this trail fades: for values higher than 190 we do not have any sort of trail
(transforming the path into a Simple Frame Differencing), while for γ=0 the trail simply does not
fade and persists for the whole duration of the process (creating quite an amount of confusion in
the output!)
What we need to do now is combine M(t) and B(t) together, which can be easily done through a
Logical block (OperationsLogical) with its operation type parameter set to OR.
Tests:
The first thing to consider while using this patch is the presence of two runtime customisable
parameters: the threshold value λ and γ.
In Figure 22 (λ=5 γ=10) we can see that when setting both parameters to such low values the
result is quite incomprehensible: the silhouette can barely be distinguished and the whole image
comes out as confusing.
Figure 22
In Figures 23-24 (λ=5 γ=60−220) we can see that with a diminishing of the pixel trail the image
becomes more and more understandable, although the amount of spurious pixels (due most of all
to the low threshold value) is still very high.
Figure 23
Figure 24
In Figure 25 (λ=20 γ=3) we can see that raising the threshold value the amount of spurious
pixels is greatly reduced, while the low value of γ results in a persisting trail that can help the
user to identify the direction of the subject’s motion (although in this case it will most probibally
confuse the user as the fading rate is too low). Lighting changes create bothersome white patches
in the background that might hide the subject’s silhouette if he happens to cross those areas.
Figure 25
In Figure 26 (λ=20 γ=10) the trail fades a little bit faster, presenting the user with a more
understandable image, although lighting changes still produce unwanted white areas.
Figure 26
In Figure 27 (λ=30 γ=3) we raised the threshold level: the white areas due to the lighting
changes are definitively reduced.
Figure 27
Figure 28 (λ=40 γ=3) shows what happens when raising the threshold value even more: the
white areas created by the changes in lighting are reduced to a bare minimum, while the detail in
the subject’s silhouette is maintained.
Figure 28
Trying to raise the threshold level even more, like in Figure 29 (λ=100 γ=3), just results in a loss
of detail in the silhouette, as some particulars fall under the threshold level and are thus
considered as parte of the background and deleted. On the positive side, all white areas deriving
from light changes completely disappear.
Figure 29
We continue refining our test in Figure 30 (λ=100 γ=5) by raising the γ value: the pixel trail is
now less persistent and allows us to distinguish quite clearly which path was taken by our
subject.
Figure 30
In Figure 31-32-33 (λ=100 γ=10−60−200) we see the results of further raising of the
γ parameter: the main point of the Persistent Frame Differencing patch slowly disappears,
becoming nothing more than a Simple Frame Differencing.
Figure 31
Figure 32
Figure 33
In conclusion, we can assert that the Persistent Frame Differencing patch gives the best results
for threshold level λ=100 (no artefacts due to lighting problems) and for γ=5−10 (a long enough
trail to distinguish the direction and quantity of movement)
References
For Simple Frame differencing, Adaptive Background Subtraction and Persistent Frame
Differencing the algorithm schemes are based on Robert Collins’ short course to the University
of Genoa regarding ‘Image sequences elaboration for video-surveillance’.