Professional Documents
Culture Documents
Abstract: This paper describes some applications of an incremental implementation of the principal component analysis
(PCA). The algorithm updates the transformation coefficients matrix on-line for each new sample, without
the need to keep all the samples in memory. The algorithm is formally equivalent to the usual batch version,
in the sense that given a sample set the transformation coefficients at the end of the process are the same.
The implications of applying the PCA in real time are discussed with the help of data analysis examples. In
particular we focus on the problem of the continuity of the PCs during an on-line analysis.
473
Lippi, V. and Ceccarelli, G.
Incremental Principal Component Analysis: Exact Implementation and Continuity Corrections.
DOI: 10.5220/0007743604730480
In Proceedings of the 16th International Conference on Informatics in Control, Automation and Robotics (ICINCO 2019), pages 473-480
ISBN: 978-989-758-380-3
Copyright
c 2019 by SCITEPRESS – Science and Technology Publications, Lda. All rights reserved
ICINCO 2019 - 16th International Conference on Informatics in Control, Automation and Robotics
rithm, including the z-score normalization, a step that (i.e. for the observables) in the usual way as
is not included in previous works presenting a simi- 1 n
lar approach like(Artač et al., 2002). We decided in x̄n( j) = ∑ Xn(i j) (2)
n i=1
light of this to describe the algorithm in detail in this s
paper. The incremental techniques cited above (Hall 1 n 2
et al., 1998; Artač et al., 2002; Hall et al., 2000) are σn( j) = ∑ Xn(i j) − x̄n( j)
n − 1 i=1
(3)
designed to update the reduced set of variables and
change its dimensionality when it is convenient for where in parentheses we write the matrix and vector
data representation. In the present work, no indica- indices explicitly. In this way we can define the stan-
tion is provided for which subset of variables should dardized matrix for the data as
be used , i.e. how many principal components to con-
x1 − x̄n
sider. All the components are used during the algo- x2 − x̄n −1
rithm to ensure the exact solution. After describing Zn = .. Σn (4)
.
the exact algorithm some implication of using this in-
cremental analysis are discussed. In particular, we xn − x̄n
provide an intuitive definition of continuity for the ob- where Σn ≡ diag(σn ) is a m × m matrix. The covari-
tained transformation and then we propose a modified ance matrix Qn of the data matrix Xn is then defined
version designed to avoid discontinuities. as
The concept of continuity is strictly related to the 1
Qn = ZT Z . (5)
incremental nature of the proposed algorithm: in stan- n−1 n n
dard PCA the batch analysis implies that the notion of We see that Qn is for any n a symmetric m × m matrix
time does not exist, e.g. the order of the elements in and it is positive definite.
the sample set is not relevant for the batch algorithm. Finally we make a standard diagonalization so that
In our treatment we instead want to follow the time we can write
evolution of variances and eigenvectors. We are thus
λ1
lead to consider a dynamical evolution. λ2
Qn = Cn−1
The paper is organized as follows. In the remain- .. Cn (6)
ing part of the Introduction we recall the PCA algo-
.
rithm and we introduce the notation used. In Section λm
2.1 we give a detailed account of the incremental al- where the (positive) eigenvalues λi are in descending
gorithm for an on-line use of PCA. In Section 2.2 we order: λi > λi+1 . The transformation matrix Cn is the
address the problems related to the data reconstruc- eigenvectors matrix and it is orthogonal, Cn−1 = CnT .
tion, in particular those connected with the signal con- Its rows are the principal components of the matrix Qn
tinuity. In Sections 3, 4 we then present the results of and the value of λi represents the variance associated
some applications to an industrial data set and draw to the ith principal component. Setting Pn = ZnCn , we
our conclusions. have a time evolution for the values of the PCs until
time step n.
1.2 The Principal Component Analysis We recall that the diagonalization procedure is not
uniquely defined: once the order of the eigenvalues is
The computation for the PCA starts considering a set chosen, one can still choose the “sign” of the eigen-
of observed data. We suppose we have m sensors vector for one-dimensional eigenspaces and a suitable
which sample some physical observables at constant orthonormal basis for degenerate ones (in Section 2.2
rate. After n observations we can construct the matrix we will see some consequences of this fact). We stress
x1
that, since only the eigenspace structure is an intrinsic
property of the data, the PCs are quantity useful for
x2
Xn = (1) their interpretation but they are not uniquely defined.
..
.
xn
where xi is a row vector of length m representing the 2 ON-LINE ANALYSIS
measurements of the ith time step so that Xn is a n × m
real matrix whose columns represent all the values of 2.1 Incremental Algorithm
a given observable.
The next step is to define the sample means x̄n and The aim of the algorithm is to construct the covari-
standard deviations σn with respect to the columns ance matrix Qn+1 starting from the old matrix Qn and
474
Incremental Principal Component Analysis: Exact Implementation and Continuity Corrections
the new observed data xn+1 . To do this, at the be- The meaning of the normalization procedure de-
ginning of step (n + 1), we consider the sums of the pends on the process under analysis and the meaning
observables and their squares after step n: that is associated to the data within the current study:
both centering around the empirical mean and divid-
n
ing by the empirical variance can be avoided by re-
an( j) = ∑ Xn(i j) (7)
i=1
spectively setting ∆ = 0 or Σ = I.
n In practice, one keeps nstart observations and com-
2
bn( j) = ∑ Xn(i (8) pute Q as given by eq. (5) and the relative C (and
j)
i=1 hence Pstart ). Then the updated Qs are used, step by
These sums are updated on-line at every step. step, to compute the nth values for the evolving PCs
From these quantities we can recover the starting in the standard way as pn = znCn . In this way the last
means and standard deviations: x̄n = an /n and (n − sample is equal for any n to the one that would result
1)σ2n = bn − na2n . Similarly the current means and from a batch analysis until time step n. Instead the
standard deviations are also simply obtained. whole sequence of the pn values with nstart < n < nfinal
The key observation to get an incremental algo- would not coincide with those from Pfinal , since the
rithm is the following identity: Qs matrices change every time a sample is added, and
likewise for the Cs matrices. The most relevant im-
Z n Σn + ∆ plications of this fact will be considered in the next
Zn+1 = Σ−1
n+1 (9)
y subsection.
The library for the present implementation of the
where y = xn+1 − x̄n+1 is a row vector and ∆ is a algorithm is available on the Mathworks website un-
n × m matrix built repeating n times the row vector der the name incremental PCA.
T Z
δ = x̄n − x̄n+1 . By definition nQn+1 = Zn+1 n+1 and,
expanding the preceding identity, we get
2.2 Continuity Issues
n Qn+1 = Σ−1 T −1
n+1 Σn Zn Zn Σn Σn+1 +
We now consider the problem of the continuity for the
Σ−1 T −1
n+1 Σn (Zn ∆)Σn+1 + PCs during the on-line analysis. In a batch analysis,
Σ−1 T −1
n+1 (∆ Zn )Σn Σn+1 + one computes the PCs using all the data at the end of
Σ−1 T −1 the sampling, obtaining the matrix Cfinal , and then, by
n+1 ∆ ∆Σn+1 +
applying this transformation and its inverse, one can
zT z (10) pass from the original data set to the set PCs values.
Of course, since we are considering sampled data, we
where z = yΣ−1
n+1 and we used the fact that the Σs are cannot speak of continuity in a strict sense. As pre-
diagonal. viously stated, the temporal evolution of the data is
Recalling that by hypothesis all the columns of the not something relevant for the batch PCA. Regardless,
matrix Zn have zero mean and that the columns of the we may intuitively expect to use a sampling rate of at
matrix ∆ have the same number, we see that terms in least two times the signal bandwidth (for the sampling
parentheses are zero. Thus theorem) usually even more, i.e. ten times. We hence
n Qn+1 = Σ−1 −1 expect a limited difference between two samples in
n+1 Σn Qn Σn Σn+1 +
proportion to the overall signal amplitude. For sam-
n Σ−1 T −1 T
n+1 δ δΣn+1 + z z (11) pled data we can then define continuity in a intuitive
sense as a condition where the difference between two
where δT δ, zT z and Qn are three m × m matrices. We
consecutive samples is smaller than a given thresh-
now see that we can compute Qn+1 by making opera-
old. A discontinuity in the original samples may be
tions only on m × m matrices and with the sole knowl-
reflected in the principal components depending on
edge of Qn and xn+1 .
the transformation coefficients, in detail
The computational advantage of this strategy is
that we do not need to save in the memory all the pn − pn−1 = znCn − zn−1Cn−1 (12)
sampled data Xn+1 and moreover we do not need to
perform the explicit matrix product in eq. (5), which that is equal to
would require a great amount of memory and time pn − pn−1 = (zn − zn−1 )Cn + zn−1 (Cn −Cn−1 ) (13)
for n ≈ 105/6 . Consequently this algorithm can be
fruitfully applied in situations where the sensors num- The first term would be the same for the batch pro-
ber m is small (e.g. of the order of tens) but the data cedure (in that case with constant C) and the second
stream is expected to grow quickly. term shows how p is changing due to the change in
475
ICINCO 2019 - 16th International Conference on Informatics in Control, Automation and Robotics
1200
1000
800
Value
600
400
200
0
50 100 150 200 250
Samples
100
90
PC var
80 Cumulative Var
70
60
Var
50
40
30
20
10
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
PC
Figure 1: Top, data-set used in the example. Bottom, variances associated to the PCs of the system.
coefficients C. We can regard this term as the source ated with two eigenspaces at two subsequent time
of discontinuities due to the incremental algorithm. steps, spanned by the vectors cn and cn+1 . They
To understand the problems that could arise, from belong by hypothesis to two close “lines” but the
the point of view of the continuity of the PCs values, algorithm can choose cn+1 in the “wrong” direc-
let us consider the on-line procedure more closely. tion. In this case, to preserve the on-line continu-
We start with some the matrices Qstart and Cstart . ity as much as possible, we take the new PC to
At a numerical level the eigenvalues are all differ- be −cn+1 , i.e. minus the eigenvector given by the
ent (since the machine precision is at least of or- diagonalization process at step n + 1. The “right”
der 10−15 ), so that we have a set of formally one- orientation can be identified with simple consid-
dimensional eigenspaces, from which the eigenvec- erations on the scalar product of cn with cn+1 . Re-
tors are taken. Going on in the time sampling, we calling the considerations at the end of Section
naturally create m different time series of eigenvec- 1.2, this substitution does not change the meaning
tors. of our analysis.
We could expect that the difference of two subse-
quent eigenvectors of a given series be slowly varying • Consider the case where the differences of a group
(in the sense of the standard euclidean norm), since of ν contiguous eigenvalues are much smaller than
they come from different Cs that are obtained from the others: we can say that these eigenvalues cor-
different Qs which differ only slightly (i.e. for the respond in fact to a degenerate eigenspace. In this
last xn+1 ). But this is not fully justified, since the PCs case we can choose an infinite number of ν or-
are not uniquely defined and in some case two subse- thonormal vectors that can be legitimately con-
quent vectors of pn and pn+1 can differ considerably, sidered our PCs, but the incremental algorithm
as shown in the exaple in Figure 4. There are three can choose, at subsequent time steps, two basis
ways in which one or more eigenvector series could which considerably differ. To overcome this prob-
exhibit a discontinuity (in the general sense discussed lem, we must apply to the new transformation a
above). change of basis in such a way not to modify the
eigenspaces structure and to “minimize” the dis-
• Consider the case of a given eigenvalue associ- tance with the old basis. Although the case of a
476
Incremental Principal Component Analysis: Exact Implementation and Continuity Corrections
proper degenerated space on real data is virtually resent an interesting description of the analyzed pro-
impossible, as the difference between two or more cess. This is expected for the estimator of the covari-
eigenvalues of Q is approaching zero, the numer- ance matrix until m & n. While the sample covariance
ical values of the associated PCs can become dis- matrix is an unbiased estimator for n → ∞, it is known
continuous in a real time analysis. This by itself to converge inefficiently (Smith, 2005).
does not represent an error in an absolute sense In order to quantify the efficiency of the algorithm
in computing Cn , as the specific Cn is the same as the computational of the proposed incremental solu-
that which would be computed off-line. tion has been compared with the batch implementa-
• In the two previous cases the discontinuity was tion provided by the Matlab built-in function PCA on
due to an ambiguity of the diagonalization matrix. a Intel Core i7-7700HQ CPU running at 2.81 GHz,
A third source of discontinuity can consist into the with windows 10 operative system. The results are
temporal evolution of the eigenvalues. Consider shown in Figure 2. The time required to execute the
two one-dimensional eigenspaces associated, one incremental algorithm grows linearly with the number
with a variance that is increasing in time, the other of samples while the batch presents an increase of the
with a variance that is decreasing: there will be a execution time associated with the size of the dataset.
time step n̄ the two eigenspaces are degenerate. As reasonably expected, the batch implementation is
This is called a “level crossing” and corresponds more efficient than the incremental one when the PCA
in the algorithm to an effective swap in the “ cor- is computed on the whole dataset, while the incremen-
rect” order of the eigenvectors. To restore conti- tal implementation is more efficient when samples are
nuity, the two components must me swapped. added incrementally.
Batch (cumulative)
0.16
Batch(single case)
3 EXAMPLES AND RESULTS 0.14
Incremental(cumulative)
477
ICINCO 2019 - 16th International Conference on Informatics in Control, Automation and Robotics
10
Score
-10
50 100 150 200 250
10
Score
-10
50 100 150 200 250
||COV-wCOV||
10
10
0
50 100 150 200 250
samples
Figure 3: Application of the incremental algorithm to the sample data-set. The uppermost picture shows the 27 principal
components computed with the batch algorithm, the second from top the incremental PCA computed without continuity
check, the third picture from top represents the Frobenius norm of the difference between the covariance matrix computed
through the incremental algorithm and a given sample and the one computed on the whole sample-set. The lowermost picture
represents the variances associated to the PCs (eigenvalues of covariance matrix). The covariance matrix and the variable
values are the same for the batch algorithm and the on-line implementation when they are provided with the same samples.
The differences in the pictures are due to the fact that same transformation computed with the batch algorithm is applied to
the whole set, while the one computed online changes with every sample.
pared to the PCs computed with the batch algorithm. ance identify the stable part of the analyzed data, and
Nevertheless they are a good approximation in that hence the one identifying the process, e.g. the con-
they are still ordered by variance and most of the vari- trolled variables in a human movement (Lippi et al.,
ance is in the first components (i.e. more than 90% is 2011) or the response of a dynamic system known
in the first 5 PCs). through the input-output samples (Huang, 2001).
478
Incremental Principal Component Analysis: Exact Implementation and Continuity Corrections
PC 9
2
PC 10
1
Score
-1
-2
PC 9
2
PC 10
1
Score
-1
-2
Figure 4: The figure shows the 9th and the 10th PCs computed without (top) and with (bottom) continuity constraints. Note
the discontinuity addressed by the arrows.
100
4.1 Software
90 On-line PCA with continuity correction
80
Batch PCA
cumulative Var. On-line
The M ATLAB software implementing
cumulative Var. Batch the function and the examples shown
70
in the figures is available at the URL:
60 https://it.mathworks.com/matlabcentral/fileexchange/
Var[%]
50 69844-incremental-principal-component-analysis
40
30
ACKNOWLEDGMENTS
20
479
ICINCO 2019 - 16th International Conference on Informatics in Control, Automation and Robotics
480