You are on page 1of 30

DIST - University of Genova

Laboratorio di Informatica Musicale (InfoMus)


http://www.infomus.dist.unige.it

EyesWeb User’s Tutorial 1

The patches analysed in this tutorial regard the elaboration of image sequences with basic
background subtraction techniques. These video elaboration methods can be applied to a great
plethora of fields, for example video-surveillance, interactive multimedia systems, musem and
entertainment applications: using a video-camera system we can immediately discern details in
our image, pointing out moving objects and following their trajectory to eventually interpret their
activities.

In this tutorial we explain the following EyesWeb 4.5 patches:

• Background Subtraction (simple algorithm thatsubtracts a background frame from our


input feed)
• Simple Threshold (algorithm that works well in high-contrast enviroments which
binarises out feed)
• Background Subtraction with Multiple Thresholds (advanced version of the previous
patch, which allows for multiple thresholds to be applied to the feed and doesn’t require
the user’s intervention)
• Simple Frame Differencing (patch that creates a silhouette by subtracting each fram from
its following one)
• Adaptive Background Subtraction (a highgly user-interactive patch that combines
Background Subtraction, and Simple Frame Differencing)
• Persistent Frame Differencing (algorithm that allows us not only to extrapolate any
moving object from the feed, but also gives us some information on how much and where
it’s moving)

Tests:

To test these patches the following sample videos were used: TestBS.avi, TestBS1.avi and
TestBS2.avi.
The “Video File Reader” block
As we are working with video feeds, all our patches will require a block that opens a video file
(be it AVI or MPEG or what else) and inputs it into our patch. This is done by the Video File
Reader block (ImageInputVideo File Reader).

In EyesWeb each block has a set of parameters to determine for it to work in the desiderd way,
so the first thing we must do is modify said parameters upon adding the block to our work-space.

In our case we are mainly interested in setting three


parameters out of its default value: the color model, the
player status and the “play” input.

Color Model must be changed from “RGB” (default setting)


to “BW” (black and white), so that our output image is
already converted into grey-scale. Why? Simply because it
is easier for the computer to work with monochrome images
rather than full colour.

Player Status indicates what status the block starts at: this is
to be set at “Stopped” and the check-ox near to the “Play”
input must be marked so that it may receive an external
input on when it should start broadcasting the video. In the
more basic patches this might seem superfluous, as the
external input comes from a Bang Generator
(BangInputBang Generator) that starts off at the
beginning of the patch, re-creating the same conditions that
we have just modified. But as we proced to more advanced
pathces, we will need for different blocks to start at the
same time and they will all be coordinated by the same
Bang Generator, so must begin their execution in their
“off”-mode.
Background Subtraction

It enables the user to extract moving pixels from a video source, eliminating any static elements
(e.g. background), which are extrapolated as a fixed image at the beginning of the feed (thus the
video must start with a capture of the empty background for this patch to work)

The final video is the result of the following algorithm:

M(t)= Abs[ I(t) – Background Image]

What we need to do is extract a frame containing an empty background (i.e. without any
“intrusions”) and subtract it from the grey-scale image we just inputed: this is done by using the
“Snapshot” block (TopologySnapshot) which will memorise and store the first frame (in this
case) in the video (remember when we said that the feed should start with the empty background
for this patch to work?).

Next we can subtract the background from the grey-scale feed with the use of an “Arithmetics”
block (OperationsArithmetic), setting the operation type parameter to “absolute difference”.

All that is left to do now is visualise the result through a “Display” block
(ImageOutputDisplay).

The main advantage of this patch is the lack of manual intervention on behalf of the user: no
parameters are to be set or changed during runtime.

On the other hand, the patch is terribly hindered by possible lack of contrast between the subject
and the background, confusing one’s pixels with the other’s and creating artefacts.
Tests:

To make the tests consistent in pointing out this patch’s merits and flaws, some slight
modifications were done to the original model:

The changes were necessary to ensure that the background frame we are using for the process is
free from any kind of disturbance, such as moving objects or light changes: what happens is that
the added “Video Reader” block receives a completely different video-input, specifically a static
recording of the background, thus (hopefully) avoiding any kind of artefact.

The main advantage that this patch presents the user with is detail: of all the ones presented in
this brief tutorial, only “Background Subtraction” presents a final output that is rich in
particulars. This means that not only the user will be able to examine things such as clothing,
physiognomy, skin tonality, etc.. but he will also perceive the final feed as three-dimensional,
clearly understanding where the extrapolated subject is.

On the other-hand, as was already mentioned, “Background Subtraction” suffers terribly from
changes in lighting and scarce contrast between background and subject.
Any sort of light change comes out as a bright spot on the background, while shadows created by
moving objects remain in the output and ruin the final result.
Furthermore, if trying to extract a dark subject from a dark background, the output will be
transparent and with an indefinite contour.
In figure 1 we can see the effects of changing lights and shadows on the output, while comparing
the image with the one presented in figure 2 the difference in definition between low-contrast
feeds and high-contrast ones is quite clear.

Figure 1

Figure 2
In both images we can notice a white area on the left-hand side: this is due to an error in
choosing the background feed (figure 3), in which the floor had been covered with sheets of
white paper (which wasn’t done for the other feeds).

Figure 3
Simple Threshold

This patch uses a variable threshold to extract a silhouette of the moving subject from the input
video. The algorithm is very simple to understand, though it proves to be faulty when the
threshold is either too high or too low and when subject and background have similar tonalities.

The first three components of the patch are the ones already described in the previous paragraph,
so we will not repeat ourselves. This time, instead of extracting a background image and using
the Arithmetics block, the patch inputs the grey-scaled image in a “Threshold” block
(OperationsThreshold Operation (int)) with a variable threshold parameter (which can be set
through an “Int Generator” item (NumericInputInt Generator) and is visualised through a
“Display” item (MathNumericScalarGenericOutputDisplay) (actually, it’s two
threshold parameters, one for the lower bondary and one for the upper one, though both are set to
the same value to create a binarised output)

What happens is that all pixels that have a grey-scale value higher than the set threshold will be
converted to white (i.e. their value will become 255), while all those that are lower will be
converted to black.

Tests:

The main disadvantage encountered while using this patch is the absolute absence of depth: the
output feed results completely flat!

Furthermore there is a great loss in details (e.g. crossed arms will not be distinguishable and will
result uniformly black), particularly if examining a dark subject.
Lighting changes also create artefacts and require the user to re-define his threshold settings.

The patch also requires the user to be an active part in the process, setting the threshold to the
right level to extract a correct silhouette: too low a threshold and not enough pixels will be
considered, too high and the subject will tend to be confused with the background.

As we can see from the following images, according to different threshold values we have
different percentages of spurious pixels (those bothersome black dots that shouldn’t be there).

Figure 4 (threshold 190) denotes a high quantity of spurious pixels and white areas near light
sources or where the background is illuminated by the frontal lights.

Figure 4
In figure 5 (threshold 190) we notice the subject disappearing due to the threshold level being
too high: both the subject and the background appear as a semi-uniform black area.

Figure 5

Figure 6 (threshold 130), shows what happens when reducing the threshold value: the
background slowly disappears, the silhouette is more precise and changes in light become less
bothersome.

Figure 6
Background Subtraction with Multiple Thresholds

The method has an edge on both its predecessors as the use of multiple thresholds eliminates the
need of user intervention during the process to obtain background removal (that was one of the
flaws in the Simple Threshold) and retains a good amount of detail while enhancing the contrast
of the image (in the Background Subtraction the image was grey-scale and thus less tidy)

All of this is obtained through the use of the Background Subtraction with Multiple Thresholds
block (Imaging->Operations->BgndSubMultThresh), which allows the process to apply different
levels of thresholds to different areas of the feed according to their lighting. It is defined by three
parameters: the number of Threshold Levels (set to 35), the minimum level (87) and the
maximum (194).

The result was then processed through a median filter (Imaging->Filters->NonlinearFilter) to


eliminate any artefacts or spurious pixels
Tests:
As with the Background Subtraction patch, we slightly modified the process to allow us to use a
separate feed for background extraction.

A few flaws have turned up following these test.

First off, the “persistent white area” problem that appeared in the Background Subtraction patch
also appears here, but as was pointed out before it’s jus a matter of incongruity between the feed
we want to elaborate and the background feed. (figure 7)

Secondly, the patch suffers from light changes, particularly if the brightness grows in time,
resulting in white areas on the screen, that might hide the subject (similarly to what happened in
the Simple Threshold patch, when the threshold level was too high. (figure 7 and 8)

Figure 7
Figure 8

All in all the patch proved to be a failure: not being able to cope with brightness changes, when
the subject is illuminated directly with bright light his silhouette disappears, thus rendering the
method useless in areas like video-surveillance.

(NOTE: the Multiple Threshold block used is the legacy version belongng to the EyesWeb 3.2
library and while this tutorial is being written (January 2007) it is not as yet present in the 4.5
distribution)
Simple Frame Differencing

The idea behind this patch is a simple differencing between two subsequent frames of the same
feed, which results in a silhouette of any object that moved between frames.

The patch begins exactly like every other one presented untill now: a Bang Generator and a
Video File Reader

A “Delay” block (TopologyDelay) is then used to capture a frame in input and delay its output,
so that we may then subtract it from the subsequent frame with the already known Arithmetics
block.

This is followed by a threshold block, which cleans up the resulting image binarising it, thus
making it easier to interpret. As usual the threshold value is controlled by the user through an Int
Generator, although we have set it to a default value of 30.

The algorithm used is:


M(t)= threshold {abs[I(t)-I(t-1)] , λ}
Where M(t) is the resulting image, I(t) is the current image, I(t-1) is the previous image and λ is
our threshold value.

Frame differencing results advantageous as it responds very quickly to changes in lighting and
camera motion, extracting only objects that are actually moving (objects that stop disappear from
the screen until they start moving again).

However, only the edge of the silhouette is extracted from the feed and there aren’t enough
references to perceive if the object is moving towards or away from the camera. Furthermore,
rapid changes in light will cause major variations in the overall lighting, creating shadows and
reflections and the patch cannot discern them from real moving objects, resulting in a very
confusing image.

Tests:

In Figure 9 we can see how the application of a low threshold (5) can produce great quantities of
spurious pixels and how the subject’s shadow is visible in the background, resulting in an
inappropriate output.

Figure 9

In Figure 10 a higher threshold value (20) results in the disappearance of the spurious pixels,
although the silhouette of the shadow remains in the background.
Figure 10

In Figure 11 the threshold value is raised even more (50) and as we can see the shadow’s
silhouette completely disappears (although this is not always true).

Figure 11
Adaptive Background Subtraction

The Adaptive Background Subtraction patch responds to changes in lighting better than the
others. Being a hybrid of the Threshold, Simple Frame Differencing and Background Subtraction
we have a distinction between moving objects (well segmented, but with a slight pixel trail) and
fixed objects (that slowly fade in the background).

The algorithm used is:

M(t) = Threshold {abs[I(t)-B(t-1)], λ}

B(t) = α*I(t) + (1-α)*B(t-1)

Where I(t) is the current image, B(t) is the result of the above expression and λ is the threshold
value. α is a parameter that can be changed at runtime and can only have two values: 0 and 1.

If we examine the patch we can clearly see that with α=1 the patch represents a Simple Frame
Differencing: B(t) is equal to I(t), thus our output M(t) becomes Threshold {abs[I(t)-I(t-1)], λ},
exactly as in the Simple Frame Differencing patch. On the other hand, if α=0 it becomes a
simple Background Subtraction, with the only difference that the “background” is the last frame
processed while α=1 and that the feed is binarised and not in grey-scale, thus losing most of the
methods advantages, while if the patch starts with α=0 the patch behaves like a Simple
Threshold (the “background” subtracted is a blank screen).

Values for α>1 weren’t considered in the elaboration, as this would only result in a brightening
of the image.

All this might seem difficult to accomplish, though after a careful study of the patch you will
realise that it’s not that hard to understand.

First off we need to generate B(t)= α*I(t) + (1-α)*B(t-1) and its components.

(1-α) is generated by combining a “phoney” Random Generator block


(MathNumericScalarGenericInputRandom Generator) to generate the “1” (that is
why we have called it phoney: to have a constant output we set the maximum and minimum
range of the random output to 1) and our usual Int Generator for the α. These are then combined
together with a Scalar Arithmetic Operation (double) block (OperationsScalar Arithmetic
Operation (double)).

They then are multiplied to B(t-1) using another similar block.

B(t-1) is withheld from the previous cycle through a Queue block (TopologyQueue) and sent
to another Scalar Arithmetic Operator to finish the algorithm (which shouldn’t need any
explaining, as it is exactly like all the other we have seen this far).

This is valid until α=1. As soon as it changes to 0 the whole configuration must change to that of
a Background Subtraction.

The background is stored through a Snapshot block, that receiving α as a Load parameter (i.e. if
the Load parameter is =! 0 the block will do it’s job, stopping if it ever becomes =0) constantly
memorises B(t-1). As soon as α=0 the Snapshot stops memorising and stores the last frame to be
used in lieu of the background input feed that we had in the Background Subtraction patch.

To finish off we put an Input Selector block (TopologyInput Selector) piloted by α, that will
skip between the Snapshot and the Queue blocks.

Tests:

We have carried out tests and have obtained the following results.
In Figure 12 (α=1 and threshold=5) we have a high amount of spurious pixels (10,6%), too high
to allow us to precisely discern our subject’s silhouette. We also have the residual image of the
subject’s shadow.
Figure 12

In Figure 13 (α=1 threshold=20) the amount of spurious pixels is drastically reduced (0.59%),
although the shadow’s contour remains.

Figure 13

Figure 14 (α=1 threshold=60) we can see an optimised elaboration: there is no sign of the
shadow and the amount of spurious pixels is so small it can be completely ignored. Sadly, we
still do not have a perfect reconstruction of our subject’s silhouette.
Figure 14

As we can see in Figure 15 some parts of the contour are missing. This is due
to the Frame Differencing technique that will delete pixels that aren’t moving
(in this case the foot that’s standing on the floor)

Figure 15

In Figure 16 (α=1 threshold=35) there are too many spurious pixels (0.79%) and too many of
the subject’s details are lost in the background.
Figure 16

In Figure 17 (α=0 threshold=160) we try to amend to the previous image’s problems by raising
the threshold level obtaining a high-contrast silhouette, while unfortunately also creating many
dark areas in the background which might cause our subject to disappear.

Figure 17

The quantity of spurious pixels is terribly high (51.4%), although if we consider


only the area surrounding the subject the percentage is lowered to 12.32% (still
too many).

Figure 18
In Figure 19 (α=0 threshold=110) we find that for this particular threshold level the spurious
pixels are only 8.83% and that most of them are far away from our subject’s silhouette,
permitting us to easily distinguish it.

Figure 19

In Figure 20 we start the patch at α=0 and subsequently changed it to 1, obtaining a Background
Subtraction. The last silhouette that was extrapolated from the image is stored in the background
and persists in the output.

Figure 20

In Figure 21 we try to eliminate this extra silhouette by raising the threshold level, but notice
that by doing so we also lose the subject’s.
Figure 21

Considering the results that were obtained during this test we can come to the conclusion that
Adaptive Background Subtraction can be used in video-surveillance with a sufficient degree of
success as long as

a) we set an appropriate threshold level according to the chosen value of α. (60 for α=1 and
110 for α=0)
b) we avoid changing α from 1 to 0 during run-time.

Persistent Frame Differencing


Similarly to the previous Adaptive Background Subtraction, this patch responds well to any
change in lighting and objects that aren’t preceived in motion fade away with time. Moving
images leave behind them a somewhat persistent trail of pixels, which’s gradient enables us to
perceive threedimensionality.

To make a long story short: this patch visualises the output’s motion history.

The algorithm used is:

M(t) is the same result obtained with ‘Simple Frame Differencing’.

To this we have to add B(t), which is very simply a residue of all previous frames, their
brightness diminished by a γ factor.
B(t) is obtained by processing the output through a Time Delay block (thus transforming our H(t)
in H(t-1) ) and then uniformly subtracting from each of its pixels γ to diminish the image’s
brightness, obtaining the aforementioned “fading trail of pixels”.

γ is the rate at which this trail fades: for values higher than 190 we do not have any sort of trail
(transforming the path into a Simple Frame Differencing), while for γ=0 the trail simply does not
fade and persists for the whole duration of the process (creating quite an amount of confusion in
the output!)

What we need to do now is combine M(t) and B(t) together, which can be easily done through a
Logical block (OperationsLogical) with its operation type parameter set to OR.

Tests:

The first thing to consider while using this patch is the presence of two runtime customisable
parameters: the threshold value λ and γ.

In Figure 22 (λ=5 γ=10) we can see that when setting both parameters to such low values the
result is quite incomprehensible: the silhouette can barely be distinguished and the whole image
comes out as confusing.

Figure 22

In Figures 23-24 (λ=5 γ=60−220) we can see that with a diminishing of the pixel trail the image
becomes more and more understandable, although the amount of spurious pixels (due most of all
to the low threshold value) is still very high.
Figure 23

Figure 24

In Figure 25 (λ=20 γ=3) we can see that raising the threshold value the amount of spurious
pixels is greatly reduced, while the low value of γ results in a persisting trail that can help the
user to identify the direction of the subject’s motion (although in this case it will most probibally
confuse the user as the fading rate is too low). Lighting changes create bothersome white patches
in the background that might hide the subject’s silhouette if he happens to cross those areas.

Figure 25

In Figure 26 (λ=20 γ=10) the trail fades a little bit faster, presenting the user with a more
understandable image, although lighting changes still produce unwanted white areas.

Figure 26

In Figure 27 (λ=30 γ=3) we raised the threshold level: the white areas due to the lighting
changes are definitively reduced.
Figure 27

Figure 28 (λ=40 γ=3) shows what happens when raising the threshold value even more: the
white areas created by the changes in lighting are reduced to a bare minimum, while the detail in
the subject’s silhouette is maintained.

Figure 28

Trying to raise the threshold level even more, like in Figure 29 (λ=100 γ=3), just results in a loss
of detail in the silhouette, as some particulars fall under the threshold level and are thus
considered as parte of the background and deleted. On the positive side, all white areas deriving
from light changes completely disappear.
Figure 29

We continue refining our test in Figure 30 (λ=100 γ=5) by raising the γ value: the pixel trail is
now less persistent and allows us to distinguish quite clearly which path was taken by our
subject.

Figure 30

In Figure 31-32-33 (λ=100 γ=10−60−200) we see the results of further raising of the
γ parameter: the main point of the Persistent Frame Differencing patch slowly disappears,
becoming nothing more than a Simple Frame Differencing.
Figure 31

Figure 32

Figure 33
In conclusion, we can assert that the Persistent Frame Differencing patch gives the best results
for threshold level λ=100 (no artefacts due to lighting problems) and for γ=5−10 (a long enough
trail to distinguish the direction and quantity of movement)
References

For Simple Frame differencing, Adaptive Background Subtraction and Persistent Frame
Differencing the algorithm schemes are based on Robert Collins’ short course to the University
of Genoa regarding ‘Image sequences elaboration for video-surveillance’.

You might also like