Professional Documents
Culture Documents
Dale E. Seborg
Abstract
This section briefly describes some of the popular compression methods for time-series data. Because the accuracy
of retrieved data depends not only on the method that was
used for compression, but also on the method used for reconstruction, some simple reconstruction techniques that include
zero-order hold and linear interpolation are also discussed
briefly.
2.1
Introduction
Present address: Johnson Controls, Inc., 507 E. Michigan St., Milwaukee, WI 53202. Email: Ashish.Singhal@jci.com
Corresponding author. Email: seborg@engineering.ucsb.edu
2.2
3.1
All of the data compression methods described in the previous section produce lossy compression, i.e., it is not possible to reconstruct the compressed data to exactly match the
original data. The accuracy by which compressed data can
describe the original uncompressed data depends not only on
the compression algorithm, but also on the method of data
reconstruction. Many reconstruction methods are available
such as the zero-order hold (ZOH) where the value of a variable is held at the last recorded value until the next recording.
Linear interpolation (LIN) is a simple method that can
overcome a part of this limitation by reconstructing data between recordings. It can provide more accurate reconstruction for situations where the process is at steady state, or situations where process variables show trends.
More sophisticated methods such as spline interpolation,
and expectation-maximization algorithm for data reconstruction have also been proposed.9, 10 But these methods are sensitive to the amount of missing data, and do not perform well
when a significant amount of data are missing.9, 10
N1
100%
NP
(1)
A second metric, the pattern matching efficiency , characterizes how effective the pattern matching technique is in locating similar records in the historical database. It is defined
as:
N1
,
100%
(2)
NDB
Because an effective pattern matching technique should ideally produce large values of both p and , an average of the
two quantities () is used as a measure of the overall effectiveness of pattern matching.:
,
p+
2
(3)
In order to compare the effect of data compression on pattern matching, a case study was performed for a simulated
chemical reactor. A nonlinear continuous stirred tank reactor
4.1
for a for given method and each process variable are proportional to their standard deviations. For example, the OSI PI
recording limits were chosen as 3i , while the recording limits for the box-car method were adjusted to produce the same
compression ratio as the PI method. Thus, the recording
limits for the box-car method were 2.23i .
The effectiveness of a compression-reconstruction
method was characterized in two ways: (i) reconstruction error, and (ii) degree of similarity between the original data and
the reconstructed data. The S PCA and S dist similarity factors
were used to quantify the similarity between the original and
reconstructed data.
5.1
5.2
(4)
Table 1. Data compression and reconstruction results for the CSTR example for a constant compression ratio.
Compression
method
Recording limit
constant (c)
Box-Car
2.2295
Backward-slope
2.7744
Combination
2.2003
Averaging
(over 1.25 min)
Wavelet
PI
Reconstruction
method
Linear
Zero-order hold
Linear
Zero-order hold
Linear
Zero-order hold
Linear
Zero-order hold
Wavelet
PI
NA
2.2669
3.0
accurate both in terms of reconstruction error and the similarity of the reconstructed and original datasets.
Although the PI algorithm produces a very low MSE,
it does not represent the data very well for pattern matching.
The wavelet method produces both a low MSE and high similarity factor values. The wavelet transform preserves the essential dynamic features of the signal in the detail coefficients
while retaining the correlation structure between the variables in the approximation coefficients. These two features
of the wavelet transform produce low MSE and high S PCA
values between the original and reconstructed data. These
features also minimize mean shifts and result in high S dist
values. By contrast the PI method, records data very accurately and produces very low MSE values, but its variable
sampling rates disrupt the correlation structure between variables and produce low S PCA values. Variable sampling also
affects the mean value of the reconstructed data and produces
low S dist values. The detailed results for different operating
conditions for the CSTR case study are reported by Singhal.11
5.3
CR
14.84
14.84
14.83
14.83
14.86
14.63
14.6
14.63
14.83
14.83
MSE
5.23
4.91
4.09
8.83
5.28
7.94
24.69
60.35
2.61
0.33
the entire database was analyzed for one set of snapshot data,
the analysis was repeated for a new snapshot dataset. A total of 28 different snapshot datasets, one for each of the 28
operating conditions, were used for pattern matching.11
Table 3 compares the pattern matching results for historical and snapshot data compressed using different methods. The best pattern matching results were obtained when
the data were compressed using the wavelet method. The
optimum NP values were determined by choosing the value
of NP for which had the largest value. Table 3 indicates
that pattern matching is adversely affected by data compression when the data are compressed using either the averaging
method or the combination of box-car and backward slope
compression methods. By contrast wavelet-based compression has very little effect on pattern matching because similar
results are obtained for both compressed and uncompressed
data. Table 4 presents results for the situation when the snapshot data are not compressed while the historical data are
compressed using the wavelet method. The p, and values
in Table 4 are slightly lower compared to those in Table 3.
Thus, if the historical data are compressed, it may be beneficial to compress the snapshot data as well to obtain better
pattern matching.
Conclusions
Table 2. Effect of different data compression and reconstruction methods on pattern matching for the CSTR example.
Compression
method
Recording limit
constant (c)
Reconstruction
method
S PC A
S dist
SF
Box-Car
2.2295
Linear
Zero-order hold
0.88
0.87
0.67
0.83
0.81
0.86
Backward-slope
2.7744
Linear
Zero-order hold
0.84
0.83
0.63
0.39
0.77
0.68
Combination
2.20025
Linear
Zero-order hold
0.87
0.85
0.67
0.79
0.80
0.83
Averaging
(over 1.25 min)
NA
Linear
Zero-order hold
0.92
0.93
0.99
0.97
0.94
0.94
Wavelet
2.2669
Wavelet
0.95
>0.99
0.97
PI
0.88
0.71
0.82
PI
3.0
Table 3. Effect of data compression on pattern matching for the CSTR example when both the snapshot and historical data are
compressed using the same method.
Compression
method
Similarity
factor
S PCA only
Original data
S dist only
SF
S PCA only
Combination
S dist only
SF
S PCA only
Averaging
S dist only
SF
S PCA only
Wavelet
S dist only
SF
Opt. N P
p (%)
(%)
max (%)
(%)
34
25
14
41
59
15
21
24
17
34
52
16
43
41
75
30
19
65
49
40
64
38
25
71
90
68
72
78
75
67
65
65
73
82
83
76
99
97
88
99
100
91
95
96
92
99
100
92
66
54
74
54
47
66
57
53
68
60
54
73
Acknowledgements
The authors thank OSI Software for providing financial
support and the data archiving software PI , and Gregg
LeBlanc at OSI for providing software support during the
research. Financial support from ChevronTexaco Research
and Technology Co. is also acknowledged.
(3) Kennedy, J. P. Building an Industrial Desktop. Chemical Engr., 1996. 103(1), 8286.
(4) Bristol, E. H. Swinging Door Trending: Adaptive
Trend Recording? In Advances in Instrumentation and
Control, volume 45. Instrument Society of America,
Research Triangle Park, NC, 1990 749754.
References
(5) Mah, R. S. H.; Tamhane, A. C.; Tung, S. H. and Patel,
A. N. Process Trending With Piecewise Linear Smoothing. Comput. Chem. Engr., 1995. 19, 129137.
(1) Singhal, A. and Seborg, D. E. Pattern Matching in Multivariate Time Series Databases Using a Moving Win-
Table 4. Effect of data compression on pattern matching when snapshot data are not compressed and historical data are compressed.
Compression
method
Similarity
factor
S PCA only
Original data
S dist only
SF
S PCA only
Combination
S dist only
SF
S PCA only
Averaging
S dist only
SF
S PCA only
S dist only
Wavelet
SF
Opt. N P
p (%)
(%)
max (%)
(%)
34
25
14
48
40
15
60
16
16
39
15
14
43
41
75
26
25
59
23
52
63
31
50
68
90
68
72
76
67
63
85
57
70
75
53
67
99
97
88
100
99
91
100
92
92
99
91
88
66
54
74
51
46
61
54
54
66
53
52
68
(13) Russo, L. P. and Bequette, B. W. Effect of Process Design on the Open-Loop Behavior of a Jacketed Exothermic CSTR. Comput. Chem. Eng., 1996. 20, 417426.