You are on page 1of 24

Signal Processing 81 (2001) 1017}1040

Advanced image-processing tools for counting people in tourist


site-monitoring applications夽
Claudio Sacchi, Gianluca Gera, Lucio Marcenaro, Carlo S. Regazzoni*
Department of Biophysical and Electronic Engineering (DIBE), University of Genoa, Via Opera Pia 11/A, I-16145, Genova, Italy
Received 13 April 2000; received in revised form 4 December 2000

Abstract

This work aims at demonstrating the usefulness of exploiting novel image-processing tools for moving-object detection
and classi"cation in the context of an actual application involving the remote monitoring of a tourist site. The application
concerns outdoor people counting for tourist-#ow estimation in a constrained environment. The technical problems to
be solved are concerned with: (a) the design and implementation of low-complexity background updating and change
detection algorithms able to adapt themselves to the time-varying illumination scene conditions, and (b) the integration
of real-time pattern-recognition tools in order to distinguish group of persons to be counted from other objects present in
the scene. The achieved results have proven that the proposed system makes it possible to obtain reliable people counting
in di!erent environmental situations, with an absolute mean error at most equal to 10%.  2001 Elsevier Science B.V.
All rights reserved.

1. Introduction considerable amount of related R&D activities re-


cently carried out at the European level (e.g., PED-
In the last years, the e!ective development of MON, DIMUS, and CREW projects, etc.), and by
automatic people-counting systems based on many patented products put on the market. Auto-
digital signal-processing techniques aroused a matic people counting is a problem already faced
considerable interest of research in electronics and in the literature for various actual applications
information science. Such interest is proven by the ranging from railway transport security, pedestrian
tra$c management, detection of overcrowding
situations in public buildings, tourist #ows estima-

This work was partially supported by the Italian Ministry tion, etc. The basic principle of operation of all
for the University and Scienti"c and Technical Research automatic people-counting systems lies in the real-
(MURST) within the framework of the National Interest Scien- time processing of signals coming from some sen-
ti"c Research Programme. sors installed close to a people passage point. The
* Corresponding author. Tel.: #39-010-3532792; fax: #39- main di!erences in such systems are: (a) the kind of
010-3532134.
E-mail addresses: sacchi@dibe.unige.it (C. Sacchi), gera@ sensors employed, and (b) the kind of signal-pro-
dibe.unige.it (G. Gera), mlucio@dibe.unige.it (L. Marcenaro), cessing algorithms to be implemented in order to
carlo@dibe.unige.it (C.S. Regazzoni). derive the number of people. An example of people

0165-1684/01/$ - see front matter  2001 Elsevier Science B.V. All rights reserved.
PII: S 0 1 6 5 - 1 6 8 4 ( 0 0 ) 0 0 2 8 0 - 2
1018 C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040

Nomenclature

AVS advanced video surveillance I (x, y) current image (i.e., frame currently
B (x, y) background image (i.e., frame represent- processed by the software)
ing the empty monitored scene) LLIP low-level image processing module
D (x, y) binary image of di!erences (i.e., image SS sun sensitivity
obtained by di!erence between current THR threshold.
image and background image)  union set operator
HLIP high-level image-processing module  intersection set operator
IU image-understanding module

counting obtained by using one-dimensional signal the extracted features is performed using non-linear
processing is given in [7]. The detection of moving piecewise models and exploiting temporal informa-
persons is performed by a sensor module made up tion by a distributed Kalman "lter network.
of a one-dimensional eight-element ceramic-coated Finally, it is worth citing Ref. [11], where a multi-
array detector with pyroelectric PbTiO , an in- camera approach is proposed to address the

frared-transparent spherical lens and oscillating people-counting problem in indoor environments.
mechanical chopping parts. Some examples of In the present work, a novel prototype of a remote
video-based people-counting systems are described video-based people-counting system, used for con-
in [11,15,17,18,21]. In [18], the proposed system tinuous tourist-#ow estimation in a small munici-
aims at counting people crossing gateway thre- pality served by a coaxial cable (CATV) network, is
sholds. The system detects and labels mobile re- considered. The prototype was realised within the
gions in the processed image. The detected regions context of the EU-ESPRIT 28494 AVS-RIO pro-
are then tracked, using a matching operation that ject, a technology-transfer activity ended in 1999
follows a minimising distance criterion. The inter- [20]. The end-user of the project, (i.e. the munici-
pretation step based on the region clustering is pality of Riomaggiore La Spezia, Italy), required,
performed to determine how many people corres- among others, a real-time tourist-counting function
pond to the mobile areas previously detected and for deriving a reliable estimate of the daily tourist
tracked. In [15], real-time image sequence analysis #ow in the peak season. Due to the particular
is made as a basic tool for counting people getting topology of Riomaggiore, a reliable estimation of
on and o! a bus. Ref. [21] describes the use of the daily tourist #ow can be achieved by means of
a RAM-based neural network for identifying the a single camera overlooking a central tourist pas-
background elements in the current image in order sage point, i.e.: the railway station square. The
to isolate changed regions containing moving per- transmission of visual information to a remote con-
sons. The application involves the counting of the trol centre has been performed in fully analog mode
number of passengers using or waiting for the lift in by exploiting the existing CATV network of
large buildings. A video-based people-counting sys- Riomaggiore. A #exible hierarchical structure of
tem for detecting overcrowding situations in under- the remote image processing system for people
ground railway stations is presented in [17]. The counting was designed by subdividing it into di!er-
image-processing tasks are performed at di!erent ent modules, each one able to process at di!erent
stages: the "rst stage for noise "ltering and change levels the video signal acquired by the CATV net-
detection, an intermediate stage for mobile-area work. This kind of algorithmic architecture allows
tracking and feature extraction, and the "nal stage the developers to face in a more e!ective way the
for image interpretation aimed at people counting. considerable technical problems arised by the se-
The estimation of the number of people starts from vere constraints in terms of counting precision
C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040 1019

(absolute mean error required less than 10%), which a background-updating strategy like the one de-
must be achieved under not-ideal and even adverse scribed in [14] would absorb into the background
environmental conditions. In such perspective, the many people to be counted. In [5], a change detec-
most relevant problem to be solved was related to tion method with automatic background updating
the implementation of background-updating and is presented. The proposed method is based on
change-detection algorithms able to work in real a statistical binary hypothesis testing (whether
time and, at the same time, su$ciently robust a pixel has changed or not), using an appropriate
against unpredictable and variable-speed illumina- threshold. A background-updating step is intro-
tion changes typical of outdoor environments (such duced by measuring the percentage of changed
requirements are generally con#icting one with re- pixels in the current image. When the percentage of
spect to another). Other problems encountered and the changed pixel is higher than a "xed mean value,
successfully solved have concerned with the real- a new reference frame is taken. In our application,
time correct recognition of people groups present in such kind of approach is not applicable, as, due to
the scene, aimed at minimising the false counting the high complexity and time variability of the
probability. This actually involved the design and monitored scenario, it does not ensure that the new
the implementation of a high-level classi"cation stored background actually represents an empty
module able to distinguish persons to be counted, scene. In this work, we will consider a simple back-
from vehicles and other objects sometimes appear- ground updating strategy quite similar to the one
ing inside the monitored square. In the following, shown in [6] (i.e. a weighting operation between
the image-processing algorithms devoted at solving the current frame and the background frame in
the above-mentioned problems are described, in order to generate a new background frame at the
order to provide a valuable technical answer to next processing step), however, applied in an adap-
a more general problem concerning the automatic tive way with respect to the di!erent zones of the
video-based recognition of group of persons in image, each of one characterised by speci"c features
complex outdoor scenes. In such perspective, the in terms of luminance and pixel membership to the
main innovation of the proposed methods lies in empty scene or to groups of moving persons. The
the exploitation of di!erent level of information lack of adaptivity is probably the most relevant
representation, in order to improve the robustness fault of many algorithms for background updating
of some critical modules of the people counting proposed in literature, when they are used in a not-
system, such as: background updating, change de- controlled luminance environment. This fault can
tection and classi"cation. be noticed also in many change detection algo-
In literature some e!ective and noise-robust rithms. In particular, two main change detection
background updating and change-detection algo- techniques are dealt in literature: the pixel-based
rithms for real world surveillance applications are techniques and the local-based techniques. The
proposed. In [14] a background-updating algo- pixel-based techniques work at pixel level: the most
rithm is considered in the context of a video-based popular one is the simple di!erencing followed by
surveillance application related to the intrusion de- thresholding. The main advantage of the pixel-
tection in an indoor scene. Such method essentially based approaches is the low computational weight.
relies on checking whether the changed pixel re- Nevertheless, the achieved results are very suscep-
mains changed for some time or not. To this aim, tible to noise and to scene illumination variations.
a time threshold is imposed. If the time threshold is The local-based methods divide the images into
exceeded, the observed pixel is regarded as a static regions containing more than one pixel and de"ne
feature of the scene, to be integrated into the back- some characteristic functions for each region to
ground, otherwise, it is considered as a moving detect changes. An example of local-based method
object. This kind of approach is not suitable for our is the quadratic picture function (QPF) [8]. This
applications: indeed it can be observed that groups method tries to model the di!erent regions in an
of persons often remain in the same position of the image by means of second-degree two-variable
monitored square for some frames. In this case, function. Then the algorithm calculates some kind
1020 C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040

of distance between the two functions modelling 2. System overview


the background image and the current image, and
classi"es a region as a changed one if this distance is The proposed remote cable TV-connected
above a certain threshold. In spite of its good re- video-based surveillance system has been struc-
sults in noisy environment, the QPF method su!ers tured as a fully centralised intelligence system, in the
from false alarm rate increasing as the illumination sense that the overall image processing tasks are
varies over time. Skifstad and Jain proposed in [23] performed by a central processing architecture
a shading model method for change detection in working in the remote control centre. In order to
time-varying illumination scenes, thus overcoming improve the computational power together with
the limitation of the QPF algorithm. Recently the ease of use, an image-processing architecture
a statistical change detection method based on the based on the high-performance computing network
computation of the higher-order circular shift mo- (HPCN) [19] concept has been employed for actual
ments (CSM) on the image regions has been pre- implementation. The HPCN architecture con-
sented in [12]. The use of the CSMs allows one to sidered here is simply a cluster of PCs connected by
distinguish in an e!ective way the structural cha- a FastETHERNET network. Each PC belonging
nges in the scene from those due to time-varying to the cluster processes the image sequences coming
illumination. The robustness of local-based change from one or more cameras and transmits the results
detection methods against noise and their capabil- to the Control Station, which presents them to the
ity of adapting themselves to time-varying illu- human operator. The block diagram of the HPCN
mination are proven by literature. However, the image-processing architecture installed in the re-
high computational complexity of such techniques mote control centre is depicted in Fig. 1. The CATV
is not acceptable in the proposed people counting receiver shown in Fig. 1 could be digital cable
application characterised by rigid constraints in modems or analog frequency converters. In the
terms of real-time working. For this reason, we actual system implementation, the full analog fre-
considered a change-detection algorithm that ac- quency division multiple access (FDMA) solution
tually employs the simple di!erence, however, able for information transmission has been chosen. The
to adapt the threshold on the basis of the local cluster is composed of:
luminosity of the image. The proposed algorithm
can exploit the low computational complexity of (1) a PC-based number-plate recognition system
the pixel-based methods and, at the same time, the devoted to implementing the other video-sur-
adaptivity involved by the image partition into veillance functionality involved (i.e. vehicular
di!erent luminosity regions, which is the key con- access monitoring [20]);
cept of the local methods. The paper is organised as (2) the people-counting system;
follows: Section 2 is aimed at providing a modular (3) the operator-interface and database system,
description of the proposed video-based people which is the actual control station, devoted to
counting system, Section 3 is focused on the low- the display of the results through suitable
level background updating and change-detection man-machine interfaces (MMI).
algorithms, Section 4 presents the middle-level im-
age-processing algorithms aimed at detecting, A detailed block diagram of the image-proces-
tracking, labelling the mobile areas in the image, sing system for people counting is shown in Fig. 2.
and at collecting features related to each detected The software system has been designed to process
area, Section 5 deals with the image-understanding images acquired by means of a monochromatic
algorithms employed to achieve a reliable estimate camera installed close to the guarded site. A non-
of the number of tourists starting from the informa- ideal choice of the camera position was forced by
tion about the mobile areas provided by the lower- the environmental constraints imposed by the im-
level modules, Section 6 reports the numerical re- possibility of obtaining the permits for the camera
sults in terms of counting precision; and Section installation close to private and public buildings in
7 draws the conclusions of the paper. the guarded area.
C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040 1021

Fig. 1. Image-processing architecture at the remote control centre.

The people-counting system depicted in Fig. 2 concerning the number of persons present in
consists of three di!erent modules; each aimed at a single frame, together with the information about
implementing a speci"c image-processing task. the blob tracking in space and time (acquired by the
The low-level image processing (LLIP) module HLIP module), are then used to derive an estimate
works directly on the received image in order to of the number of arriving tourists by means of
produce other images that can be useful for further simple mathematical expressions. Three blocks
processing steps. The main image-processing tasks make up the IU module: a neural network, a
implemented at this level concern image acquisi- Hough transform algorithm and a Kalman "lter.
tion, "ltering, background management and detec- Finally, the tourist-counting results are conveyed
tion of the di!erences between the current frame to the operator interface and database system, where
and the updated background. The input to the they are displayed in real time through a suitable
LLIP module is the analog TV signal received from visual interface and stored in a database for o!-line
the CATV network and its output is the binary checking and elaboration. In the following sections
image of the di!erences between the current frame of the paper, the detailed descriptions of the most
and the updated background image. relevant image-processing system modules will be
The high-level image processing (HLIP) module provided.
aims at identifying mobile areas (i.e., blobs) in each
image subsequently acquired by the system and to
track them in space and time, jointly with the 3. Low-level image-processing algorithms for
extraction of some numerical characteristics of the outdoor people counting
area themselves ( feature extraction). The HLIP
module receives as input the binary image of the 3.1. Image acquisition and noise xltering
di!erences from the LLIP module, and provides as
output a list of tracked features for each detected The design and the implementation of the LLIP
blob. module have been one of the most critical point in
The image understanding (IU) module is aimed at the entire system development. Indeed, such a mod-
computing the number of persons within each de- ule should be able to work in time-varying external
tected and tracked blob, starting from the feature luminance conditions, by processing an analog
extracted at the HLIP level. The numerical results video signal coming from a coaxial cable network.
1022 C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040

Fig. 2. Block diagram of the image processing system for people counting.
C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040 1023

Table 1
Experimental trials of analog image transmission over the
CATV network: MSE, RMSE and PSNR values obtained by the
comparison of four di!erent pairs of frames (each pair is com-
posed by a frame grabbed from the demodulated TV signal
received by the network and the corresponding one grabbed
on-site from the TV camera signal)

Pair no. MSE RMSE PSNR (dB)

1 402,9 20,07 22,08


2 395,5 19,89 22,16
3 391,3 19,78 22,20
4 393,2 19,83 22,18

"ltering procedure adopted for cable-based people


counting is performed in two steps (see Fig. 2):

A linear step before change detection, using a


spatial "ltering consisting in a 3;3 convolution of
an acquired image over a "ltering mask;
A non-linear morphological step after change
Fig. 3. Image grabbing window for people counting.
detection, using morphological operators, such as
erosion and dilatation [22], for cutting o! isolated
The transmission of a TV signal over a public changed pixels due to residual image noise.
shared network like a CATV one involves the addi-
tion of signal-quality degradation factors such as: Such a combined "ltering strategy has been
Gaussian noise (due to cable lines and ampli"ers), proven as very e$cient in the case of noise uniform-
impulsive-ingress noise (due to electromechanical ly a!ecting the acquired images as usually happens
and electronic devices working at the residential in the full analog TV transmission. In fact, neither
sites [26]), and inter-channel interference (due to misdetection nor false detection errors were noticed
spectral overlapping of other TV channels) [16]. In in the higher-level processing stages due to the
order to achieve an acceptable quality of the re- channel noise present in the acquired images. In the
ceived analog signal considerable power expenses following, we will describe in depth the innovative
were required. A sample quality of the acquired solutions adopted for background updating and
image is shown in Fig. 3. change detection. These were the most critical im-
In Table 1, experimental results concerning age processing aspects to be faced in the actual
the noise e!ects on the acquired images are system implementation. The solution we con-
shown. sidered to solve such problems consisted in the use
Each row of Table 1 gives a comparison in terms of image segmentation in order to subdivide both
of mean-square error (MSE), root mean-square er- the background image and the current image into
ror (RMSE) and peak signal-to-noise ratio (PSNR) di!erent regions, each characterised by a uniform
between a frame directly acquired from the TV pixel grey level and by other features related to the
camera and the corresponding frame grabbed from pixel membership to di!erent change detection
the received CATV signal. The PSNR values are zones. This strategy allows one to apply low-com-
quite below the typical values achieved in the case plexity change detection and background-updating
of low-JPEG compression (about 40 dB). This algorithms in an adaptive way with respect to the
means that noise e!ects are considerable and a ro- characteristics of the single segmented image
bust "ltering mechanism is required. The noise- regions.
1024 C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040

3.2. Background updating such kind of pixel membership can be easily re-
turned by the change detection module. Let us
Background-updating stage is not a strictly ne- consider the binary change detection image D(x, y),
cessary step in video-surveillance systems working output of the change detection module. We parti-
in controlled luminance environments. However, it tion this image into two subsets M and ;: M is the
becomes extremely important in outdoor scenes set of the pixels in which an object has been detec-
where lighting conditions are potentially widely ted, while ; is made by the pixels belonging to the
variable to improve system robustness. In the con- empty scene:
text of our work, we observed that only in cloudy
days the luminance conditions of the monitored M"(x , y )∀i : D(x , y )"1,
G G G G
scene are substantially time-invariant. In other (3.2)
cases, variable speed illumination changes are en- U"(x , y )∀i : D(x , y )"0.
G G G G
countered.
Let us introduce some quantities that will be Formula (3.2) can be modi"ed as follows:
used in the following: in the present paper we will
B (x, y)"I (x, y)
indicate background image with B(x, y), current I> I
image with I(x, y) and change detection image with #[(!1) f (x, y)#1][B (x, y)!I (x, y)], (3.3)
D(x, y). A classical background updating algorithm 3 I I
[6] involves a simple weighting operation for where f (x, y) is the characteristic function of the
3
generating a new background at step k#1: ; set.
This method allows one to exploit a simple inter-
B (x, y)"I (x, y)#[B (x, y)!I (x, y)], (3.1)
I> I I I action between di!erent image-processing sub-
where 3[0, 1] is the background updating coe$- modules. However, it can be usefully exploited only
cient; we can see that if  is close to 0, background in the case of slow illumination changes. Indeed,
updating speed is heavy, while if 
1, the a fast illumination change has a high probability to
background updating is very soft. This kind of be detected as a changed region: as a consequence it
algorithm applies the same degree of updating to could generate huge errors in further estimation
all the pixels composing the background, without process. To this end, a further improvement of the
considering whether a pixel belongs to the back- proposed method has been introduced, using two
ground or to any object previously detected in the di!erent background-updating techniques; both
scene, but not present in the background. The based on formula (3.3). If no high-speed illumina-
major fault of this method is that after a certain tion change is detected a soft background updating is
number of frames (depending on the parameter ) performed by applying (3.3) as it is. If a sudden
a still object in the environment is always inte- illumination change is detected, a heavy background
grated in the background. In this way, every system updating is applied only on zones of the image
based on such method becomes blind to still objects classi"ed as lighting variations. In the second case,
after a certain number of steps. Clearly, this event is (3.3) is applied by regarding the ; set as composed
to avoid in a surveillance system like the one con- only by those pixels belonging to change detection
sidered in this work, which is devoted at counting regions involved by lighting variations and by
the number of persons (still or moving) present in setting the background updating parameter
the monitored square. "
0. The change detection regions due to
&
In order to avoid this kind of behaviour, we lighting variations are obtained by means of image
"rstly used a method similar to the one described in segmentation and region classi"cation operations.
[6]. The basic improvement introduced consisted In Fig. 4 the block diagram of the heavy back-
in a selective background updating. The modi"ed ground-updating algorithm is shown.
algorithm did not upgrade the whole background One can see that the current image I(x, y) and the
image, but only the pixels not belonging to a background image B(x, y) are both segmented. As
change detection region. The information about a result of the segmentation, the two images are
C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040 1025

Fig. 4. Block diagram of the heavy background-updating algorithm.

Fig. 5. Segmented region intersection for heavy background updating: explicative picture.

partitioned in a number R of areas: The subset of the image on which the heavy
background updating has to be performed is com-
S "r : k"1, 2, R, r puted through two steps. The "rst step performs an
I I
intersection operation between di!erent segmented
"(x, y) : l(x, y)
const., (3.4a) regions, and considers the global change detection
S "r : k"1, 2, R, r region as obtained by the union of change detection
' 'I 'I regions associated with each background region:
"(x, y) : l(x, y)
const., (3.4b) cd "cd(r ) by following the rule:
L L
where S is the segmented background image, and " (cd )" ((((r , r ))A, (r , r ))),
S is the segmented current image. Both the two L 'L L 'L L
' L L
images are made up by the R regions r  and (3.5)
I
r , respectively, and l(x, y) is the luminosity of the
'I
pixel of 2-D coordinates (x, y), i.e. the grey level of where r and r indicate the nth region in the
'L L
the pixel itself. A pixel is said to belong to a certain current and background images, respectively,
region j if the luminosity l of that pixel is close whereas the operator C is the complement opera-
enough to the mean luminosity of the region j:lM ( j). tion made with respect to the whole background
A histogram-based pixel classi"cation algorithm, image. In Fig. 5, the above-mentioned intersection
using a set of (R}1) thresholds for both current and operation is graphically explained.
background images is used to estimate the region When the region set  has been generated, the
sets S and S and their properties. system applies a further region growing [22] step to
'
1026 C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040

separate also regions in cd into spatially connected where  . When the heavy background-updat-
L &
groups, i.e.: ing algorithm is introduced, the higher-level system
modules (i.e. LLIP change detection and mor-
"cd , i"1, 2, n, k"1,2, R, (3.6)
GI phological "ltering sub-modules, HLIP and IU
where cd is the ith change detection region, with modules) are bypassed, as the huge errors related to
GI the estimated number of persons present in the
respect to the kth background segmented region
r . Each region cd is characterised by a set of current frame would a!ect in a permanent way the
L GI global people #ux estimation. In this case, the num-
features (e.g. area, shape factor, etc.). A decision rule
is introduced in order to "nd whether a region ber of persons present in the current frame is as-
belonging to  is due to an illumination change or sumed to be equal to the one computed for the
to a real change in the scene. In general, a decision previous frame. To allow temporal regularisation,
rule can be used that is based on several region the acquisition rate has been set to the value of
features: however, a simple rule has to be preferred. 1 frame/s in order to avoid missing persons (single
For example, one can assume that, if the scene is or groups) when the people counting is not enabled
not overcrowded, the classi"cation can be only for few frames in heavy background updating op-
performed on the basis of the area of the segmented erations. Fig. 6 shows the results of the segmenta-
region, which is actually the number of pixels inside tion algorithm: in Fig. 6(a) and (b) the segmentation
it, or equivalently its cardinality. A more complic- of the background image and the current acquired
ated rule could take into account the shape of the image into two regions are reported, respectively.
segmented region in order to decide if it is conse- Fig. 6(c) points out the output of the intersection
quent to a scene illumination change and it has to module where the changes in the scene due to the
be eliminated. In the people-counting application illumination changes are evident. Note that only
considered here, the decision rule for the detection the area of interest (the square) is processed by the
of illumination changes in the scene is based on the system. Other non-interesting areas of the image
area of each segmented region and on the shape of have been preventively masked.
the regions. For the speci"c guarded environment, An example of the results achieved by heavy
the shadow projected onto the scene is particularly background-updating algorithm is shown in the
long and narrow if compared to the regions frame sequence reported in Fig. 7(a)}(d), which
generated by walking pedestrians or cars. evidences how the heavy background-updating
A shadow due to illumination changes can be de- procedure actually works. In Fig. 7(a) the shadow
tected as a thin rectangular region and the reference line generated by the luminance variation which
image is suddenly adjusted. If an illumina- occurred in the background scene is clearly notice-
tion change is detected in the scene, the system able, whereas in Fig. 7(d), the same line is absorbed
enters the `heavy background updatinga in the updated background. Only the areas of the
stage and it is able to absorb high-speed varia- background image corresponding to changes due
tions in the luminosity of the scene; otherwise, to luminance variations, already evidenced in
a slow background updating is performed to adjust Fig. 6(c), are processed and updated. In the exam-
the reference image to slow variations. If we ples of Figs. 6 and 7, both the background image
de"ne HH¹(), the set of regions where the heavy and the current image have been segmented into
background updating must be applied, we can two regions (R"2), as in the speci"c situation the
rede"ne the background updating operation as railway station building naturally partitions the
follows: monitored square into two di!erent illumination
zones.


I (x, y)#[B (x, y)!I (x, y)]
I I I
if(x, y)3;, 3.3. Change-detection algorithm
B (x, y)"
I> I (x, y)# [B (x, y)!I (x, y)]
I & I I In Section 3.2, a region-based change detection
if(x, y)3H, (3.7) algorithm has been introduced in order to detect
C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040 1027

Fig. 6. Image segmentation for heavy background-updating algorithm } visual results: (a) segmentation of the background image, (b)
segmentation of the current image and (c) intersection of the segmented images after region growing.

Fig. 7. Heavy background-updating sequence (a)}(d).

the image regions where an illumination change detecting mobile regions corresponding to a group
occurred. Such regions are then quickly absorbed of persons to be counted, as it does not provide the
by background updating. Although this algorithm required resolution. For this reason, we had to
has proven to be e!ective and real-time working for consider change detection methods working at pixel
background updating, it cannot be exploited for level. The proposed change-detection algorithm
1028 C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040

Fig. 8. (a) Block diagram of the adaptive change detection module and (b) adaptive change detection threshold settings.

uses a pixel-based di!erence technique. The system The "rst step in this direction is an image
decides that a pixel is changed if the di!erence segmentation that is performed on the background
between the grey-level value of that pixel in image by following a procedure similar to the one
the current image and the correspondent pixel in shown in (3.4a), when the system starts.
the background image is higher than a certain After this preliminary step the system uses the
threshold accordingly to the `simple di!erencea segmented image and the current image to calcu-
rule late the mean of each area: the computed value is
considered as the correspondent region luminosity.


0 if I(x, y)!B(x, y)(THR, Let us call FM the set of the means on each region
D(x, y)"
1 if I(x, y)!B(x, y)*THR, (card(r ) is the cardinality of the segmented region
I
k):
where THR is a certain threshold. By analysing real
outdoor images, it can be found out that a single  I(x, y)
"xed value of the threshold THR is not enough to
give an acceptable behaviour of the module. In 
FM " fM : fM "V WZPI
I I card(r )
I
, k"1, 2, R .

most cases scenes are considered in which a large At this point the non-linear function sketched in
part of the guarded environment is shadowed from Fig. 8(b) is used to "nd the adapted threshold value
a building or from an environmental feature of the for each area and to build the set TH of the R
scene. thresholds, each associated with a segmented
The proposed change-detection algorithm works region r :
by using a modi"ed version of the simple di!erence I
TH"th "th(r ) : k"1, 2, R. (3.8)
algorithm. This kind of problem becomes impor- I I
tant in every situation of high lighting (i.e. a sunny Finally, change detection is performed with the
day): the system that uses only a "xed threshold adapted threshold values, and the sets ; and M are
either is blind in the shadowed zone (because of the computed for the soft updating case. Note that
loss of resolution) or it is blind in the sunny area change detection is not applied in the heavy back-
(because of the saturation). The modi"ed change ground-updating case. The quantity (A!B) in Fig.
detection module is able to modify the threshold on 8b is a number related to the sun-sensibility of the
the basis of the local luminosity of the image. system: let us call it SS. If SS is large, the system
A block diagramme of the module is shown in actually uses only one threshold for every area in
Fig. 8(a). the scene and it can be said that it is not sensible to
C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040 1029

Table 2 4. Blob detection, tracking and feature-extraction


Numerical values of the change detection and background up- algorithms
dating parameters

Parameter Value 4.1. Blob detection and tracking submodules

SS 150 The HLIP module receives as input the binary


THR 40 di!erence image returned by the LLIP module.
 (heavy background) 0.05 This image contains the only change-detection
 (soft background) 0.99
regions, corresponding to groups of moving people
or other kind of moving object present in the scene.
The most important image-processing functions
performed by the HLIP module are detection and
the sun. Otherwise, if SS is small, a di!erent thre- tracking of mobile regions in the image and feature
shold is used in each area and the change detection extraction from each detected region. The HLIP
is sensible even when the di!erence in the lighting blob-detection and tracking algorithms allow one
of the two zones in the image is very small. Table 2 to analyse the scene to be monitored at di!erent
shows the values used for the parameters of the levels of abstraction, i.e.:
proposed method. (a) Detection and labelling of blobs is expected to
In Fig. 9, it can be seen that the current processed provide as output the number of mobile regions
image (a), the resulting change detection with the present in the image. Such regions are called blobs
classical simple di!erence method (b) and the re- (i.e., connected sets of changed-pixel regions).
sulting one with the modi"ed method subdividing A single label is assigned to each blob detected in
the image into two regions (c). The performance the frame. Each labelled blob is graphically evi-
improvement, provided by the proposed method, is denced by means of a rectangle (see examples of
evidenced by the detection of two groups of blob detection results in Fig. 10a).
persons, otherwise, missed by employing the (b) Mobile-blob tracking provides as output
simple di!erence algorithm. Such improvement has a graph (see Fig. 10b) of blob correspondences over
been achieved by adding a negligible computa- time; in the graph each blob is denoted by the same
tional load to the whole processing system, thus number over time. The extracted graph of corre-
respecting the real-time constraints imposed by the spondences is used to compute the 2D positions of
end-user. the people present in the scene.

Fig. 9. Example of change detection results: (a) current image, (b) binary change detection image obtained by the simple di!erence and
(c) binary change detection image obtained by the adaptive proposed algorithm.
1030 C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040

Fig. 10. Output of the HLIP module: (a) output of the blob detection submodule (labelled blob evidenced in the current frame by
rectangles) and (b) output of the blob tracking submodule (blob graph).

(c) Mobile-feature detection extracts seg- for blobs. More precise details about the algo-
mented regions present in each detected blob in rithms are provided in [24,25].
order to improve the robustness of the tracking The labelling task denotes each blob by a num-
process by using information at a higher-level ber that assigns a degree of relationship in order to
resolution. allow one to recognise the blob in an image se-
(d) Mobile-feature tracking is expected to follow quence and to de"ne the temporal correspondences
the extracted regions in space and time, as is done between the blob in the current image and the same
C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040 1031

blob in the other images. In this paper, if two blobs Table 3


in the current and previous images show a tem- Choice of the feature set for tourist-counting application
poral correlation, they are called `childa and `fa-
Compu-
thera, respectively. These identi"ers are used by the tational
image-understanding module to have a feedback No. Type complexity Accuracy
between two (or more) consecutive neural network
results in order to improve the counting precision 1 Edge density (Sobel algorithm Medium High
by taking into account the history of the temporal [3])
2 Number of maxima of the Medium High
evolution of the scene. 1-d pro"le edge histogram [17]
3 Second-order Hu-moment [4,9] Medium Medium
4.2. Feature extraction 4 Blob width Low High
5 Blob height Low Medium
The following HLIP submodule performs feature 6 Blob perimeter (i.e. number Low High
of pixels contouring the blob)
extraction from each detected and tracked blob in 7 Blob area (i.e. number of Low High
order to provide the image-understanding module pixels inside the blob)
with a feature set that is suitable for the video- 8 Blob shape-factor (S H Low Medium

surveillance application. A wide set of potentially (blobperimeter)/4(blobarea))
useful features was initially considered. The choice
of the current feature set was made on the basis of
the ability of discriminating among di!erent blob avoid people counting for blobs containing object
classes. The image-understanding module has to forms. In such a case the entire IU module is
assign a class to each detected blob, corresponding bypassed, and the people counting value for the
to the number of persons present in the blob itself. blob is set to 0.
Some experimental trials were performed to map The chosen neural network (NN) performs the
the feature values versus the number of persons conversion of input features extracted for blobs
visually counted in each blob (i.e., the real class of containing human forms into the number of people
the blob). At the end of these trials, the feature set present in such blobs. The unknown functional
shown in Table 3 was selected for actual computa- correspondence between feature values and people
tion. Moreover, in Table 3, the computational com- counting is learnt in an o!-line training phase.
plexity and the degree of accuracy in blob class The temporary counting regularisation module
discrimination are reported for each feature listed. exploits the information provided by the HLIP
module about the temporal correspondences be-
tween blobs detected in consecutive frames in order
5. The image-understanding algorithms to improve the precision of the neural network's
counting, thus overcoming the problems due to the
The image-understanding (IU) module performs suboptimal training strategy adopted. A linear
the conversion of the features extracted by the Kalman "lter is then used to avoid residual noisy
high-level image-processing module into the num- peaks in the number of persons counted in the
ber of people present in each blob detected in the current frame. Finally, a simple numerical integra-
current image. The IU module can be subdivided tion performs the computation of the amount of
into submodules, as described in Fig. 11. people that entered the monitored scene from an
The Hough-transform submodule can be re- initial observation time (t ) up to the current-frame

garded as a further step of shape-based "ltering (the time (t ). This is actually the "nal people counting

previous steps have been performed at the back- result (i.e. tourist counting) expected by the
ground-updating level and by the morphological end-user. The initial value of the tourist counting
"lter working after change detection). Indeed, it can value (i.e. at time t ) is the number of persons

distinguish whether a given blob contains human counted in the "rst frame processed by the system.
forms or object forms (e.g., a van), and it is used to Then, the cumulative tourist counting value is
1032 C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040

Fig. 11. Block diagram of the image-understanding (IU) module.

incremented by the di!erence between the number sons and objects potentially present in the scene
of persons counted in the current frame and the one (e.g., small vans and electric coaches running in the
counted in the previous frame, if such di!erence is railway-station square). The class `objecta, which
positive (i.e. some new persons might have entered corresponds to a 0 value, was introduced as the "rst
the scene during the processing step). Otherwise, attempt. However, it was observed that the neural-
the tourist counting is not updated. network classi"er often mismatched this class with
other classes generally representing large numbers
5.1. The Hough-based pattern-recognition of people. In order to avoid heavy-counting errors
submodule a Hough-transform-based pattern-recognition algo-
rithm [1] has been exploited as header IU sub-
By using the selected features pointed in Section module. This transformation uses the information
4.2, it is very di$cult to discriminate between per- about a blob's contour (determined by the HLIP
C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040 1033

Fig. 12. Discrimination of objects from groups of persons: (a) sample frame of an image sequence with groups of persons and objects
blended in the scene and (b) graphs of the number of persons present in the current frame obtained by using only the neural network
classi"er (dashed line), actual number of persons visually counted by the human operator (dash}dotted line), number of persons present
in the current frame estimated by the entire image-processing software (solid line).

module), to extract the various lines that make up both persons and vehicles is shown. In Fig. 12b,
the blob contour itself. The form of a van is much some numerical results are shown concerned with
more regular than the form of group of persons. a comparison between the number of persons
Therefore, a thresholding operation on the line counted in the current frame obtained by the
density can be performed in order to derive the image-understanding module supplied by the
membership of a blob in a human class or in an Hough-based pattern-recognition submodule and
object class. In Fig. 12a, a sample frame containing the counting obtained by another IU module that
1034 C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040

did not include it. The estimated values are then output class represents the number of people in the
compared with the number of persons actually current blob, as this number can range between one
present in the frame visually counted o!-line by and 10 (i.e., the number of persons with highest
a human operator. probability of being present inside a blob). More-
The dashed line of Fig. 12b depicts a typical over, a special class is used to describe groups made
example of counting error involved by the neural- up of more than ten persons. The value of this class
network classi"er when one or more vans appear in makes it possible to minimise the error that may
the scene. In this speci"c case class 0 is confused a!ect some classes of the training set. In order to
with class 7. Therefore, the image-understanding avoid noisy patterns due to illumination artifacts,
module estimates seven more pedestrians than the in the training phase images are grabbed under
ones actually present in the scene, which are depic- optimal illumination conditions directly at the site.
ted by the dash}dotted line. On the contrary, the Fifteen di!erent image sequences, each one com-
insertion of the Hough-based pattern recognition posed of about 60 frames were used for training.
submodule allows one to avoid such kind of classi- The choice of the training sequences was also
"cation errors, thus providing estimated values of driven by the necessity of providing the neural
the number of persons present in the monitored network with an almost equal number of training
scene, depicted by the solid line, very close to the patterns for all the eleven classes considered. In this
actual ones. sense, we can say that a training set with equiprob-
able classes was employed.
5.2. The neural-network submodule
5.3. The temporary counting regularisation
A three-layer multilayer perceptron neural net- sub-module
work using a back-propagation learning rule [13]
has been chosen as the basic classi"cation algo- The e!ects of the non-ideal training of the neural
rithm in the context of the considered people- network can be noticed in the di!erent counting
counting application. This kind of neural network results related to successive frames, where the num-
takes a long time for the o!-line training phase. ber of persons is the same. Therefore, it is necessary
Nevertheless, when the optimal weights are found, to resort to a dynamic kind of counting that takes
it makes it possible to perform an assigned task in into account the history of the temporal evolution
real time, thus its performances in terms of this of the scene in a way similar to the one described in
aspect are better than those provided by other [2]. As mentioned in Section 4.1, the HLIP module
kinds of state-of-the-art classi"ers employed for provides the information about blob tracking by
people counting (e.g., the Bayesian networks con- means of a graph containing each detected and
sidered in [17]). labelled blob tracked in both space and time. The
The network topology consists of 8 neurones in application of the knowledge of a blob's temporary
the input layer, 20 neurones in the hidden layer and history to the tourist-counting problem can be
11 neurones in the output layer. A suboptimal explained as follows. Up to now, the number of
training strategy `by epochsa [13] has been re- persons present in the ith frame N has been
quired due to the impossibility of providing a com- G
computed only on the basis of the neural-network
plete pattern set to the neural-network training classi"cation results, i.e.:
algorithm (the above set would have an in"nite
cardinality in the considered application). Thus, the G
N "  CG , (5.1)
selected training set is made up of a large number of G I
patterns representative of classes, and tests have I
been performed using di!erent initialisations for where B is the number of blobs detected in the ith
G
the network. The feature set provided by the HLIP frame, and CG is the classi"cation result provided
I
module as input to the neural network has already by the neural network for the kth blob in the ith
been described in Section 4.2. Each neural network frame.
C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040 1035

The temporary history of the blob allows one to volving a permanent o!set in the characteristic
derive a prediction of the number of persons present curve of the people counting. The people counting
in the ith frame, NK , on the basis of the classi"cation cumulated in consecutive frames is on average lin-
G
results concerning the (i!1)th frame. Such results early growing. Therefore, if the people counting
are related to both predicted classes derived from value notably varies between two or three con-
the information about previous frames, and current secutive images, it is very likely to have been altered
classes derived by the neural network, as shown in by some artefacts or other kind of classi"cation
the following expressions: errors. In order to round o! the people-counting
peaks, a linear Kalman "lter [10] has been used.
G
NK "  CK G , (5.2a) The Kalman "lter allows one to improve the degree
G I
I of correlation between the measure of the number
CK I"f (CK G\, CK G\, 2, CK G\ , CG\, 2, CG\ ). of persons present in consecutive frames, and hence
G   G\  G\
it can be regarded as a further step in the temporary
(5.2b)
regularisation of people counting. The use of the
The prediction function f cannot be explicit in Kalman "lter allows to reduce the absolute mean
simple closed form. It expresses the recursive class error on the number of persons counted in the
inheritance of the blobs belonging to the current current frame to below a 10% threshold, regarded
frame. Such inheritance depends on: as the precision constraint to be ful"lled in the
(a) The blob classi"cation results achieved for the considered application. In Fig. 13, some partial
previous frame. tourist counting results obtained by processing 300
(b) The label assigned to a blob in the current frames are shown in order to point out the useful-
frame and the relationship between such a blob ness of the introduction of the Kalman "lter sub-
and one or more blobs belonging to the pre- module. The upper curve of Fig. 13 depicts the
vious frame (i.e., between blob fathers in the tourist counting results obtained by the image un-
previous frame and blob children in the current derstanding module of Fig. 11 not including the
frame). Kalman "lter. The two curves below depict the
Finally, it has been decided to average both the tourist counting results obtained by the complete
prediction result and the current neural-network- IU module of Fig. 11, and the actual number of
based classi"cation result, in order to obtain an persons that entered the square during the observa-
estimate of the number of persons present in the tion time, respectively. One can see that the tourist
current frame. A statistical coe$cient equal to 0.5 is counting curve obtained without using the Kalman
used to average the two achieved results, providing "lter tends to follow in its various segments the
them with the same degree of likelihood. The choice pattern of the actual tourist counting curve,
of assigning an equal likelihood to the prediction however, with some evident o!sets introduced by
result and the current classi"cation result has been noisy estimation peaks occurring in few frames.
made on the basis of some ad hoc experimental A noticeable error on the "nal counting can be
observation trials. involved by such kind of drift, which is almost
reduced by the use of the Kalman "lter block.
5.4. The linear Kalman xlter sub-module

The people-counting result obtained by the clas- 6. Numerical results achieved by tourist counting
si"cation and pattern-recognition algorithms, de- 5nal tests
scribed in the previous sections, can present some
noisy peaks, substantially due to sudden and un- Some extensive tests, under di!erent weather
predictable lighting artefacts occurring between conditions, were performed in order to prove the
two consecutive images or due to errors made by robustness of the system. The aim of the overall
the neural network classi"er. Such noisy peaks tests was to achieve a counting precision with an
sometimes increased the absolute mean error, in- absolute mean error (MAE) less than or equal to
1036 C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040

Fig. 13. Tourist-counting results achieved with and without the Kalman "lter block.

10%. The MAE was computed by averaging with characterised by di!erent illumination of the re-
respect to the test sequence length (N ) the nor- gions in the scene and by slow variations in the
 
malised absolute error between the tourist number illumination of the scene over time. The third test is
estimated by the video-based people counting sys- related to an image sequence of 320 frames (ac-
tem (¹K ) and the tourist number counted by the quired on a variable-weather day), characterised by
G
human operator, who analysed by sight image di!erent illumination of the regions in the scene
sequences stored after the processing step (¹ ) and by fast variations in the illumination of the
G
(see (6.1)). scene over time. The graphs of the counting results
related to the cloudy day are presented in Fig. 14,

 
1 ,  ¹ !¹K  and the graphs of the counting results related to the
MAE"  G G . (6.1)
N ¹ sunny day and to the variable-weather day are
  G G
displayed in Figs. 15 and 16, respectively.
In the following, the results show the robustness of The three graph patterns show satisfactory accu-
the system to di!erent weather conditions. To this racy of the tourist counting under all the weather
end, three signi"cant test results are reported. Of conditions considered. The absolute mean errors
course, the image sequences used for test are com- on the "rst two tests (see Figs. 14 and 15) are equal
pletely di!erent from the image sequences used for to 10% and 9%, respectively, which ful"ls the
training. The "rst test is related to an image se- performance constraints imposed by the end-user
quence of 771 frames (acquired on a cloudy day), requirements. The last test (see Fig. 16) is a!ected
characterised by a uniform illumination of the by an absolute mean error equal to 10%, which is
overall areas present in the observed scene and by very small, considering that this test is related to the
negligible variations in the illumination of the scene worst working situation for the tourist-counting
over time. The second test is related to an image software. Observing the graph in Fig. 16, concern-
sequence of 241 frames (acquired on a sunny day), ing the number of tourist entering the scene
C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040 1037

Fig. 14. Tourist-counting results achieved during a cloudy day.

Fig. 15. Tourist-counting results achieved during a sunny day.


1038 C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040

Fig. 16. Tourist-counting results achieved during a variable-weather day.

Fig. 17. Tourist-counting results achieved with and without the adaptive change-detection algorithm.
C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040 1039

estimated by the software, one can see that when 3. The need for addressing points 1 and 2 in a test-
fast light variations occur, the system needs a peri- "eld characterised by precise constraints on the
od of adjustment to tune itself to the scene illumina- camera positions.
tion changes. In such a case, an underestimation of
The resulting system can be regarded as a
the tourist #ow occurring in few consecutive frames
`close-to-marketa demonstrator of a low-cost ac-
can be accepted (maximum measured error about
tual product for people counting not only restricted
25%). The last experiment was performed in order
to the tourist site monitoring application con-
to evidence the crucial impact on the counting
sidered. Other applications should concern with
precision of the employment of a low-level change
crowding estimation on metropolitan trains and in
detection module with adaptive threshold. In
railway stations, classi"cation and counting of per-
Fig. 17, tourist counting results are shown for the
sons and vehicles present in proximity to highway
sunny day case both using a change detection
toll gateways, counting of visitors in public parks,
algorithm with adaptive threshold setting, like the
and fun parks, security of indoor o$ces, etc.
one described in Section 3.3 (line graph corre-
sponding to the SS value equal to 150) and using
a "xed threshold for each area of the image (line
graph corresponding to the SS value equal to 255). Acknowledgements
The tourist-counting results that are provided by
the change-detection algorithm working with The authors wish to thank Carlo Dambra,
a "xed threshold are quite below the required Franco Oberti, and Giancarlo Rapallo for their
precision constraints. This is consequential to the valuable assistance in collecting the paper results.
missing of group of persons, already shown in
Fig. 9b. The real-time requirements are fully met,
too, as the measured processing time is about 1 s References
per frame, achieved by using a PC-based hardware
[1] O. Chutatape, L. Guo, A modi"ed Hough transform for
platform with a PENTIUM 400 MHz processor line detection and its performance, Pattern Recognition 32
and 64 MB of RAM capacity. (1999) 181}192.
[2] F. Cravino, M. Dellucca, A. Tesei, DKEF system for
crowding estimation by a multilple-model approach, Elec-
7. Conclusions tron. Lett. 30 (5) (March 1994) 390}391.
[3] E.R. Davies, Machine Vision, Academic Press, New York,
The purpose of this work has been to describe 1990.
some novel image-processing solutions for video- [4] S.A. Dudani, K.J. Breeding, R. McGhee, Aircraft identi-
"cation by moment invariants, IEEE Trans. Comput. 26
based people counting in the context of a real-time (1) (January 1977) 39}45.
automatic tourist #ow monitoring application. The [5] E. Durucan, F. Ziliani, O.N. Gerek, Change detection with
proposed solutions allow to provide reliable tech- automatic reference frame update and key frame detector,
nical answers to some critical points to be faced in 1999 Non Linear Signal and Image Processing Workshop
the actual system implementation, quite common (NSIP99), Antalya, Turkey, 20}23 June 1999, pp. 57}60.
[6] G.L. Foresti, C.S. Regazzoni, A change-detection method
in many outdoor video-based surveillance applica- for multiple object localization in real scenes, Proceedings
tions i.e.: of the 1994 International Conference on Industrial Elec-
tronics (IECON 1994), Bologna, Italy, 1994, pp. 984}987.
1. The choice of e!ective, yet real-time, low-level [7] K. Hashimoto, M. Yoshinomoto, S. Matsueda, K. Mori-
image-processing algorithms for change detec- naka, N. Yoshiike, People-counting system using multi-
tion and background updating that are also able sensing application, Sensors Actuators } A } Physical A66
to work in an outdoor environment. (1}3) (April 1998) 50}55.
2. The necessity of employing e!ective real-time [8] Y.Z. Hsu, H.H. Nagel, G. Refers, New likelihood test
method for change detection in image sequences, Comput.
pattern recognition techniques for mapping low- Vision Graphics Image Process. 26 (1984) 73}106.
and middle-level scene descriptions in a continu- [9] M.K. Hu, Visual pattern recognition by moment invariant,
ous reliable estimation of the number of tourists. IEEE Trans. Inform. Theory 8 (1962) 179}187.
1040 C. Sacchi et al. / Signal Processing 81 (2001) 1017}1040

[10] R.E. Kalman, A new approach to linear "ltering and [19] C. Sacchi, C.S. Regazzoni, C. Dambra, Use of advanced
prediction problems, Trans. ASME-J. Basic Eng. (March video surveillance and communication technologies for
1960) 35}45. remote monitoring of protected sites, in: C.S. Regazzoni,
[11] V. Kettnaber, R. Zabih, Counting people from multiple G. Fabri, G. Vernazza (Eds.), Advanced Video-Based Sur-
cameras, Proceedings of the 1999 IEEE International veillance Systems, Kluwer Academic Publishers, Norwell,
Conference on Multimedia Computing and Systems, MA, 1999, pp. 154}164 (Chapter 4).
Florence, Italy, Vol. 2, June 1999, pp. 267}271. [20] C. Sacchi, C.S. Regazzoni, C. Dambra, Remote cable-
[12] S.C. Liu, C.W. Fu, S. Chang, Statistical change detection based video-surveillance applications: the AVS-RIO pro-
with moments under time-varying illumination, IEEE ject, Proceedings of the 10th Internaternational Confer-
Trans. Image Process. 7 (1998) 1258}1268. ence on Image Analysis and Processing (ICIAP99), Venice,
[13] F.L. Luo, R. Ubenhauen, Applied Neural Networks for Italy, 27}29 September 1999, pp. 1214}1215.
Signal Processing, Cambridge University Press, Cam- [21] A.J. Scho"eld, P.A. Metha, T.J. Stonham, A system for
bridge, UK, 1997. counting people in video images using neural networks to
[14] A. Makarov, Comparison of background extraction based identify the background scene, Pattern Recognition 29 (8)
intrusion detection algorithms, Proceedings of the 1996 (August 1996) 1421}1428.
IEEE International Conference on Image Processing [22] J. Serra, Morphological "ltering: an overview, Signal Pro-
(ICIP96), Lausanne, Switzerland, Vol. I, September 16}19, cessing 38 (1994) 3}11.
1996, pp. 521}524. [23] K. Skifstad, R. Jain, Illumination independent change de-
[15] A. Mecocci, F. Bartolini, V. Cappellini, Image tection for real world sequences, Comput. Vision Graphics
sequence analysis for counting in real time people getting Image Process. 46 (1989) 387}399.
out of a bus, Signal Processing 35 (2) (January 1994) [24] A. Teschioni, C. Regazzoni, Performances evaluation strat-
105}116. egies for an image processing systems for surveillance ap-
[16] J.R. Palmer, CATV Systems } Design, philosophy and plications, in: C.S. Regazzoni, G. Fabri, G. Vernazza (Eds.),
performance criteria as the basis for specifying equipment Advanced Video-Based Surveillance Systems, Kluwer Aca-
components, IEEE Trans. Broadcasting 13 (2) (April 1967) demic Publishers, Norwell, MA, 1999, pp. 76}90 (Chapter 2).
57}68. [25] A. Tesei, A. Teschioni, C.S. Regazzoni, G. Vernazza, Long
[17] C.S. Regazzoni, A. Tesei, Distributed data fusion for real- memory matching of interacting complex objects from real
time crowding estimation, Signal Processing 53 (August image sequences, in: V. Cappellini (Ed.), Time Varying
1996) 47}63. Image Processing and Moving Object Recognition, Vol. 4,
[18] M. Rossi, A. Bozzoli, Tracking and counting moving Elsevier, Amsterdam, 1997, pp. 283}288.
people, Proceedings of the 1994 IEEE International Con- [26] R.P.C. Wolters, Characteristics of the upstream channel
ference on Image Processing (ICIP94), Austin, TX, noise in CATV networks, IEEE Trans. Broadcasting 42 (4)
Vol. III, November 13}16, 1994, pp. 212}216. (December 1996) 328}332.

You might also like