Professional Documents
Culture Documents
by
of the Requirements
DOCTOR OF PHILOSOPHY
December 2014
Copyright
c by KIN MING KAM 2014
I would like to thank my advisors Dr. Li Zeng and Dr. Shouyi Wang for their
ing me, and also for all of their invaluable guidance and the supports during the
I wish to thank Dr. Victoria Chen for her consistent support since the very
beginning of my PhD study and for her interest in my research and taking time to
Also, I am grateful to all the teachers who have taught me during the years I
also like to thank my classmates whom I studied together, did projects together and
shared all the precious moment together. And, I would like to thank the adminis-
trative staff members in the office who did excellent work during the course of my
study.
Last but not least, I would like to thank my parents and my wife for their love
iv
ABSTRACT
radiation doses to tumor with minimal normal tissue exposure by accounting for
current respiratory motion prediction approaches are still not satisfactory in terms
there are three major ingredients of this approach: (1) construct a real-time accumu-
approach, (2) nd k nearest-neighbor patterns in the pattern library and apply a two-
step approach to screen out the disturbing patterns and nd out the nal predictive
patterns. (3) the nal prediction is made using the bootstrapped mean of the future
values of the selected predictive patterns given a prediction horizon. Based on a study
of respiratory motion traces of 27 patients with lung cancer, the proposed prediction
v
approach has generated consistently signicant higher accuracies than the current res-
There has been much interest in the beneficial effects of musical training on
cognition. Previous studies have indicated that musical training was related to better
working memory and that these behavioral differences were associated with differences
in neural activity in the brain. However, it was not clear whether musical training
study has been performed, including various univariate and multivariate features,
theory. The advanced feature selection approaches have also been employed to select
the most discriminative EEG and brain activation features between musicians and
was achieved using Proximal Support Vector Machine (PSVM). For working memory,
delay period. For long-term memory, significant differences on EEG patterns between
groups were found both in the pre-stimulus period and the post-stimulus period on
both working memory and long-term memory and that the developed computational
vi
TABLE OF CONTENTS
ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
LIST OF ILLUSTRATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . x
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xv
Chapter Page
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
2.2.1 ARIMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
vii
2.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
State-Of-The-Art Methods . . . . . . . . . . . . . . . . . . . . 76
viii
4. Pattern Recognition and Classification of Multivariate Time Series Signals:
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
ix
LIST OF ILLUSTRATIONS
Figure Page
ods, (b) multiple nested seasonal periods, and (c) multiple non-nested
1.2 An example of low-count time series which are sample inventory de-
mands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
terval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
x
2.13 Histogram of r/R of R . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.21 ACF and PACF plots of model (0, 0, 0)(1, 1, 1)7 for case 1 . . . . . . . 43
2.22 ACF and PACF plots of model (1, 0, 1)(1, 1, 1)7 for case 1 . . . . . . . 43
2.23 MSE of the two methods: DLM (dark blue) vs. ARIMA for 6 cases
2.24 R estimate of the 6 cases from left to right and then. We see that except
case 5, the R estimate approaches to a stable value when there are more
observations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.5 Three best neighbors (solid black lines) of the current segment (solid
3.6 Scatter Plots (left) and Autocorrelation Function (right) of the height
terval. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.8 The prediction accuracy for various window length with window ratio
to median interval (R) ranging from 0.3 to 1.5 for prediction horizon
3.9 A 3-D plot of the prediction accuracy for various window length with
window ratio to median interval (R) ranging from 0.3 to 1.5 for predic-
3.12 A zoom-in view of Figure 3.11, using unaligned BNs(Left) and right-
3.13 Scatter plots of the error before tλ vs the error after tλ . Correlation
3.14 An illustration of the error of the best neighbors before and after time
tλk . If the error at the left hand side is large, then error at the right
3.15 A real example of the error of the best neighbors before and after time
tλk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.28 This example shows that even the two time series have the same amount
of error but the occurrences of the errors can be very different. The
above plot shows that the two patterns match very well in the older
data (left) but do not match well in the newest data. Therefore, for
3.29 The weights of the shorter window (black dotted) and the longer window
(red dotted) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
member with low confidence, High represents remember with high con-
4.5 Group of Channels for Inter- and Intra-hemispheric power band asym-
4.6 Comparison for the EEG signals of 30 channels of musicians (blue line)
4.7 Head plot for musicians and non-musicians at epoch B1 at 100sec with
xiv
LIST OF TABLES
Table Page
on 27 patients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.1 Frequency ranges and the corresponding brain signal frequency bands
that stimuli in test session/if it was the 1st or 2nd stimuli. For condi-
uli/correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
xv
4.3 The table of the classification accuracy for 46 conditions and 8 epochs
4.4 The table of the classification accuracy for 46 conditions and 8 epochs
with 5-fold cross validation and 10 features selected by mRMR and with
4.5 The table of the classification sensitivity and specificity for 46 conditions
4.6 The table of the classification sensitivity and specificity for 46 conditions
xvi
CHAPTER 1
Introduction
1.1 Motivation
a useful classification of the trend and seasonal patterns depending on whether they
are additive or multiplicative [1]. Following the work of Box and Jenkins, some
linear exponential smoothing forecasts were showed as special cases of ARIMA model.
In 1985, Snyder proposed a class of innovation state space models and paved the
methods which can be derived statistically. Through these years, many time series
problems have been identified and various methods have been developed to overcome
Time series analysis has attracted much attention in the past three decades.
According to Google Scholar search engine, until 1990, there were only 67,000 litera-
tures containing the keywords ”time series”, while in 2000 the number rose to 453,000.
containing the keyword ”data”. Although this may be a rough text mining, still, we
are able to see a tremendous volume of researches in the context of time series.
According to Google scholar, there are 1,660,000 results on the keywords ”time
series” and prediction. It constitutes over half of the total research volume in time
series analysis. Time series prediction is a very popular and challenging problem.
Time series analysis comprises methods for analyzing time series data in order
to extract meaningful information out of the data which includes several major ar-
1
eas of study: indexing, clustering, classification, prediction, summarization, anomaly
detection and segmentation [2]. These resemble to most data mining areas which
are very popular in scientific research and industries. These methods are very often
used together. For instance, Rubio proposed a weighted least squares support vector
machine for time series prediction which combines the use of prediction and classifi-
cation [3]. In our study of respiratory motion time series prediction, we apply pattern
Time series prediction is a study that uses a model to predict future values
based on historical data. In service industries, the ability to accurately estimate the
demand is very important for better marketing and cost saving. Preez [4] investigated
tourism demand from four European countries to the Seychelles by using time series
to give patients a better quality of life [5, 6]. In hospital, management applies time
series prediction to predict the demand of nurse triage centers. [7] In transportation
Time series prediction can be classified into two categories: stationary time se-
ries prediction and non-stationary time series prediction. ARIMA and many stochas-
tic models, such as dynamic linear models, perform well on stationary data. ARIMA
is one of the most popular methods in time series prediction because of its generic
properties [8]. Autoregressive, moving average and exponential smoothing are spe-
cial cases of the ARIMA framework [1]. However, ARIMA has its own limitations.
Also, these methods generally do not do well on non-stationary data. Therefore, new
industries provide tremendous amounts of data. In other words, that provides us a lot
2
of research opportunities. Large amount of data can be available at service industries
This dissertation has been focused on the demand forecasting for service indus-
tries and respiratory motion time series prediction. These two problems cover both
stationary and semi-periodic time series. Therefore, the solutions provided have great
motion time series prediction which is to predict the tumor position during radiother-
apy, are considered as well as the number of calls received at a nurse triage center
and the loads history of cargo in railroad service. The research seeks to explore fun-
Background Preez [4] points out that accurate forecasts of tourism demand are
essential for efficient planning by the various sectors of the tourism industry, and
product is perishable, e.g. unused plane seats, hotel rooms and hire car rental cannot
be stockpiled. Specifically, short-term forecasts can aid decision making in areas such
major problem nationally and occurs when there is a mismatch between the demand
and supply of the resources needed to evaluate, treat, and discharge patients from
the ED. In current practice, bed requests and preparation to receive the patient often
3
facilities are not desired. Therefore, Peck investigated forecasting methods to predict
the demand.
[10]. Grain shippers need the forecasts to evaluate transportation equipment needs,
establish marketing plans, and formulate strategies for negotiating prices and service
with railroads. Port authorities need forecasts of rail grain transportation for port
Time series prediction involves time series analysis which is to decompose the
properties of the time series and quantify the individual property. These properties
To decompose the trend of a time series, the traditional method is to study the
by differencing while other kinds of complex non-stationary property may need new
methods to solve.
The irregular component describes random and irregular influences in the time
series. This component may be decomposed and described by using statistical anal-
ysis. For instance, in dynamic linear models, the observation and the hidden average
In service industries, many of the time series data have strong periodicity. The
After de-seasonalization, the time series can become stationary and many classical
methods can be applied on that stationary time series. For complex seasonal time
4
series, it can be multiple seasonal periods, high-frequency seasonality, non-integer
(a)
(b)
(c)
Figure 1.1: Examples of complex seasonality showing (a) non-integer seasonal periods,
(b) multiple nested seasonal periods, and (c) multiple non-nested and non-integer
seasonal periods
5
The seasonality can be found either by using Fast Fourier transform to analyze
time series pattern [13].In low-count time series, the counts in any given period are
sufficiently small that it may be unrealistic to forecast them with conventional mod-
els, including ARIMA, based on the normal distribution. Yelland proposed to use
dynamic linear models (DLM) to solve this type of problem. Figure 2 shows an
Figure 1.2: An example of low-count time series which are sample inventory demands
time series which may need some experience on the data or a separated analysis to
This research presents a development of the dynamic linear model with appli-
the proposed method is to provide a framework that makes the dynamic linear model
6
1.2.3 Pattern-Based Online Prediction of Semi-periodic and Nonstationary Time
Series
and to reduce the damage to normal body tissues. To achieve that, the respiratory
motion in radiotherapy has to be accounted for. Currently, there are several methods
1. Motion-encompassing methods
3. Breath-hold methods
ation (during both imaging and treatment delivery) within a particular portion of
the patients breathing cycle, commonly referred to as the gate. Breath-hold meth-
ods are to control the tumor position for radiotherapy. For breast cancer, during
inhalation the diaphragm pulls the heart away from the breast, and thus there is
potential reduction of both cardiac and lung toxicity. Forced shallow breathing with
cursions, while still permitting limited normal respiration. Real-time tumor tracking
robotic arm or, alternatively, by aligning the tumor to the beam via couch motion.
(1) identify the tumor position in real time; (2) anticipate the tumor motion to allow
for time delays in the response of the beam-positioning system; (3) reposition the
beam; and (4) adapt the dosimetry to allow for changing lung volume and critical
7
structure locations during the breathing cycle. [5] In this dissertation, the prediction
of tumor position is studied. One way to predict the position is through the prediction
located at superior segment of right lung in Figure 3 with a circle, respiration is the
dominant source of the tumor motion but other sources such as cardiac motion may
The method proposed in this dissertation is designed for any time series that
shows the characteristics of semi-periodic time series. Other popular examples are
ATM cash demands and geo-data, such as sea level, sea temperature and seismic
activities.
series refer to a signal that is virtually periodic, yet demonstrates both microscopic
and macroscopic variations. The characteristics of semi-periodic time series are drift-
8
ing in mean position, frequency and phase, and the occurrences can be considered as
random. Figure 4 shows the respiratory motion time series of lung tumor patients.
Another challenge is to fully use the historical data. Most of current state-of-
the-art prediction methods only consider local trends and are unable to take the whole
time series into account [15, 16, 17]. This wastes a lot of important information.
posed by using pattern recognition techniques to conquer the issue of the individu-
ality and to fully use all the available respiratory records. The prediction method is
to search for similar patterns from the history and then use the information of these
9
best matching patterns for prediction. There are two major challenges: 1) to find
the best neighbors which are the most relevant to the prediction problem and 2) to
This dissertation focuses on the methodologies for addressing the two problems
industries. The problems will involve both stationary and nonstationary time series.
model on stationary time series prediction problem for healthcare and railroad in-
dustries. ARIMA and DLM represent two different ways to explain and model time
series. The mechanisms of how they work and the limitations of the algorithms will
Respiratory motion time series which is one kind of semi-periodic time series, is
selected to study in this chapter. Variant k-Best-Neighbors is used as the core method
for time series pattern matching. At the end of the chapter, the proposed method
10
Figure 1.5: Outline of the dissertation
11
CHAPTER 2
Due to the advance of information technology, there are more and more ways
to collect time series data. For example, consumer devices such as mobile phones
and laptop computers collect data and upload them to the Internet. Sensors such
as GPS and RFID can record positions with time stamps. Machines such as glass
as EEG and ECG record vital signals of patients. Time series data do not only
grow horizontally but also vertically such that more and more big data are available.
Increasing availability of time series data empowers us to obtain more knowledge via
Tasks of time series data mining include indexing, clustering, classification, fore-
casting and anomaly detection [18]. Indexing assigns indices for a query of time series
to represent its similarity to a class. Prediction and certain analysis can be done by
using this similarity information. Clustering separates time series into groups based
on available independent variables. For each group, the time series show similar prop-
erties. Classification classifies time series into some predefined classes. Forecasting
models the underlying system and predicts future values. Anomaly detection finds
ries. This study will focus on the forecasting problem of time series data, which is a
12
Traditional time series data usually have relatively low dimensionality. While
are no longer able to cope with massive data. Also, due to high non-stationarity and
large amount of noises that may be present in some available time series data, tradi-
tional time series analysis tools such as ARIMA methods which assume stationarity
may be no longer suitable for these situations [18]. So it is necessary to find a way
to overcome the limitations of the traditional approaches and uncover complex and
as expertise, experience and intuition is useful when historical data are not avail-
can be further classified into causal and non-causal methods [1, 18]. Causal meth-
ods include Linear Regression, Econometrics Models and Artificial Neural Networks
(ANNs) models, where predictions are made based on data of relevant influential
factors. Non-causal methods include Moving Average [19, 20, 1, 21, 18], Exponential
Smoothing [1, 18], Box-Jenkins [19, 20, 1, 21, 18], State Space [1, 18, 22] and Spectral
Analysis [1, 18]. More details of quantitative methods are given in Figure 1.1.
Quantitative methods usually analyze some characteristics of the time series for
prediction, e.g., trend, seasonality, cycles and randomness. The trend of time series
us if a pattern repeats at a fixed time interval. Cycle is very common in time series
data. The patterns may repeat at varying time intervals. Randomness makes pat-
terns difficult to identify and it is desired to identify the randomness from systematic
patterns [18].
13
Figure 2.1: Categorization of quantitative forecasting models
and its variants such as root mean squared error (RMSE), mean absolute error (MAE)
and mean absolute percentage error (MAPE) are commonly used. Another popular
data set that can be explained by the forecasting model. For model selection prob-
lems, Akaike information criterion (AIC) is often used, which adds penalty to model
mation criterion (BIC) and Hannan-Quinn criterion (HQC) are popular alternative
criteria to AIC and are consistent. Also, BIC generally penalizes free parameters
more strongly than AIC does. Cross validation is another model selection method.
In cross validation, data are divided into two sets, with one for training and the other
for validation. Considering the variation in the data, many different training and val-
14
idating sets will be used. Besides best subset selection, stepwise model selection, such
as forward selection and backward selection, are often used to find the best model.
ARIMA methods are the most popular tools for time series forecasting and have
been applied in many different applications such as tourism forecasting [8, 6] where
Seasonal ARIMA is applied to determine the size of the flows of tourism demand
in Montenegro. Recently, the dynamic linear model (DLM) forecasting methods are
forecasting [6, 23]. In this section, basics and forecasting procedures of these two
methods will first be reviewed, and then the issues in using them in practice will be
discussed.
2.2.1 ARIMA
erage. It models a time series in these three components. Hence, autoregressive mod-
els (e.g., GARCH), moving average models (e.g., SES, EWMA) and random-walk
models with or without trend are special cases of ARIMA models. For autoregressive
of order p (i.e., AR(p)), the current value depends on previous values plus the current
error term,
15
zt−p = B p zp
(1 − φ1 B − · · · − φp B p )zt = at (2.2)
or
φ(B)zt = at (2.3)
For moving average of order q (i.e., MA(q)), the current value depends on the
zt − µ = θ(B)at (2.5)
This model assumes that the underlying process is stationary which means that
the mean and variance are constant, and the autocovariances depend only on the time
lag. Figure 2.2a shows a typical example of stationary time series which resembles the
Box and Jenkins [20] point out that homogeneous nonstationary sequences like
the data in Figure 2.2b can be transformed into stationary sequences by taking suc-
smoothing, ARIMA is able to model the seasonality of time series. The following
ARIMA models assume the terms in the time series have linear relationships
and the residual follows normal or t distribution with a constant mean and variance.
• Model Specification
• Model Estimation
• Diagnostic Checking
MODEL SPECIFICATION
17
The following rules are typically used in building ARIMA models [21]:
• Differencing (I) -
tions are all small and patternless, then the series does not need a higher
encing assumes that the original series has a constant average trend (e.g.
with two orders of total differencing assumes that the original series has a
term (which represents the mean of the series). A model with two orders
• AutoRegressive (AR)
18
adding an AR term to the model. The lag at which the PACF cuts off is
term to the model. The lag at which the ACF cuts off is the indicated
number of MA terms.
• AR and MA
others effects, so if a mixed AR-MA model seems to fit the data, also try a
model with one fewer AR term and one fewer MA termparticularly if the
to converge.
• Unit Root
– Rule 9: If there is a unit root in the AR part of the modeli.e., if the sum
– Rule 10: If there is a unit root in the MA part of the modeli.e., if the sum
– Rule 11: If the long-term forecasts appear erratic or unstable, there may
MODEL ESTIMATION
19
at = θ1 at−1 + · · · + θq at−q + zt − φ1 zt−1 − · · · − φp zt−p (2.8)
normal distribution with mean zero and variance σ 2 , we can use maximum likelihood
estimation to estimate θ and φ. Loss function can be derived according to the specified
model.
is that the error terms are independently distributed by a normal distribution with
mean zero and variance σ 2 . For diagnostic checking, we need to check if the mean of
the residuals is close to zero, check the residual plot to see if the variance is constant
and check the autocorrelation plot to see if there is any violation of the assumption
Pn ¯ ¯
t=k+1 (ât − â)(ât−k − â)
râ (k) = Pn ¯2 (2.9)
t=1 (ât − â)
the forecast of zn+1 can be obtained by minimum square error estimation (MMSE).
(n) (n)
zn (l) = zn (r, m) = β0m + β1∗ r (2.12)
20
(n)
The forecast function is described by 7-time-unit levels β0m and a coefficient
(n)
β1∗ for the yearly trend change. Representing in autoregressive form, the forecasts
forecasting may be desired. The mechanism of these two forecasting schemes is illus-
trated in Figure 2.3 and 2.4 The steps involved in the 1-step-ahead and k-step-ahead
1. Train a model by using the training data set T = [1, 2, ..., a1].
2. Forecast only the next value (one-step forecasting) based on the trained model.
21
3. Repeat the process by adding the observed data point in training and moving
1. Train a model by using the training data set T = [1, 2, ..., a1].
3. Repeat the process by adding the observed data point in training and moving the
forecasting period by 1 step, i.e., T = [1, 2, ..., a] and F = [a+1, a+2, ..., a+ +1].
22
Issues Of ARIMA Forecasting ARIMA is a general time series analysis tool.
accurate prediction of future values. Due to the generality, ARIMA models have
gained great popularity in time series analysis. However, there are three issues in the
the model specification. As shown in Section 2.2.1, the process typically involves a
find a good model, a trial-and-error strategy may be followed which increases the
computational time.
historical data are needed to build the ARIMA model. For instance, a model of order
a result, forecasting cannot be done at the beginning of the process, but has to start
process is deterministic, that is, the underlying mean is either a constant or has a
the mean of differenced time series should appear to be a constant. However, this
may not always be satisfied in practice. Due to the existence of random factors,
23
nonstationary time series are very common, especially data collected in short periods.
For such time series, the forecasting performance of ARIMA models may not be
satisfactory.
State Space methods for modeling time series data. It is a hierarchical model with
two levels: the mean model which represents the evolution of mean via state space
transition, and the observation model which models the observed values by taking into
account the mean evolution and observational errors. Figure 2.4 shows the structure
Observation model:
yt = µt + vt , vt ∼ N (0, V ) (2.13)
Mean Model:
time t, vt represents observational error and wt represents mean evolution. Note that
the variance of mean errors is R times of the variance of the observations, where R is
the signal-to-noise ratio, also called drift parameter. This model has three parameters:
the initial mean µ0 , the variance of observations V , and the drift parameter R. Usually
R is assumed to be known, while the other two parameters are unknown and need to
be estimated. This model is a basic DLM with only first order and constant variance
V.
24
Figure 2.5: Structure of the dynamic linear model
Typically the DLM is estimated using Bayesian methods [22]. The prior speci-
Initial prior:
µ0 ∼ N (m0 , C0 V ) (2.15)
n0 d0
φ ∼ Gamma( , ) (2.16)
2 2
Posterior:
nt dt
(φ|y1 , . . . , yt ) ∼ Gamma( , ) (2.18)
2 2
Updating recurrence relationships:
Ct + R
Ct = (2.19)
Ct + R + 1
nt = nt−1 + 1 (2.21)
25
(yt − mt−1 )2
dt = dt−1 + (2.22)
Ct + R + 1
Forecasting:
ft = ȳ = mt−1
The initial prior for the mean is a normal distribution, and the prior for the
precision of mean (Φ = 1/V ) is a Gamma distribution. These are the conjugate priors
of the model, that is, the resulting posterior distributions have the same form of the
priors, except that the parameters need to be updated by equations 2.19 - 2.22. In
this framework, the forecast of the observation at time t is equal to the posterior
To apply the DLM model, the starting mean µ0 , the variance of observations V
For µ0 and V , the approach is to estimate them using historical data. For the signal-
Step 2 : Estimate the parameters of the initial priors using the historical data.
m0 = ỹ H (2.24)
26
C0 = 1 (2.25)
n0 = m (2.26)
d0 = n0 · var(y H ) (2.27)
d0 are the degree of freedom and scale parameter of the Gamma distribution.
Step 3 : The forecast value ft can be obtained by the updating equations (2.16
signal-to-noise ratio R, which is the ratio of the mean variance to the observational
However, there is no obvious way to determine its value, and users have to guess a
value based on their experience. This brings some inconvenience to the application of
this method in practice and may also affect its performance when the signal-to-noise
For Dynamic Linear Models, the forecasting procedure starts with inputting the
initial priors and the signal-to-noise ratio, R. To determine the initial mean, µ0 , we
may use the information in the historical dataset. The variance of observations, V, is
(2.17)∼ (2.19). Finally, we need to specify the value of the signal-to-noise ratio, R,
27
then all parameters are all set and we are able to do forecasting. To make the process
Figure 2.6 illustrates the schema of the new forecasting procedure using the R
estimates. The left side of the diagram shows that the initial mean is estimated by
where nh is the number of historical data. The right side shows the forecasting
procedure with the updated R estimate. The new data are obtained sequentially
shown as blue squares. Each time a new observation is obtained, we will forecast
the next value by specifying the value of R which is automatically estimated by the
28
Step 2 : Assume the initial prior, C0 = 1 and use the proposed method to
Note that initially we assume the variance of hidden mean equals to the obser-
proposed forecasting procedure described in Section 2.3 and to compare the proposed
method to ARIMA and DLM methods described in Section 2.2. The scenario design
specific concerns to be addressed in this study will be given in Section 2.4.2, results
of simulations will be shown in Section 2.4.3, and our findings will be summarized in
Section 2.4.4.
Some preliminary studies are first done to determine the ranges of the param-
eters V , nh , nf , R and rin such that patterns in the forecasting performance, if any,
can be captured. These five parameters are defined as follows in the simulation:
is the true value of the signal-to-noise ratio, rin is the specified value of R or R
estimate, Rest , obtained by the proposed method, and V is the observation variance.
For instance, because the historical data is only used for estimating the initial mean,
its range is selected to reflect the effect of accuracy of mean estimation. Also, small
values of rin is interested in this study because the signal-to-noise ratio is typically
very small in practice. Five fixed values are considered for each parameter to cover
three levels (small, medium and large). For rin , in order to compare the scenario with
specified r and the scenario with updated r by the proposed method, five fixed values
of r are considered along with the R estimate, Rest . To find out the effect of each
parameter, we only change one parameter and fix the others at typical levels. Details
In the simulation study, 1000 time series are generated under each parameter
setting. For each time series, the data are generated through the following steps:
to 0
The first nh will be used as historical data, while the rest of them will be used
in forecasting.
30
2.4.2 Concerns To Address
rately estimate the true R, the ratio of R estimate to true R, r/R, is used to have
fair comparison. That is, r/R will be close to 1 if the estimation is good.
Question 2: What is the performance of ARIMA, DLM and DLM with updating-
R procedure and what are their strength(s) and weakness(es)? The root mean square
error (RMSE) is used to evaluate the prediction error and to compare the performance
which method(s) is the most robust? Method(s) are always desired to be robust,
on R estimation (only for DLM with updating-R procedure)? Through the study of
the effect of parameters, we will be able to validate our method by evaluating whether
2.4.3 Results
[0.001 0.01 0.02 0.05 0.08] and other parameters are set to be between in ranges
of typical levels shown in Section 2.4.1 in order to include a wide range of values while
obtaining analyzable time series. 2500 simulations are done and the R estimate is
31
obtained in each simulation. Figure 2.7 shows the number of forecasts, nf versus the
mean and variance of R estimates. From Figure 2.7a, we can see that the mean of R
estimates converge to the true value of R, which means that the proposed R estimator
is unbiased. Figure 2.7b shows that the variance of R estimate becomes smaller when
Effect Of Parameters Simulations are done to study the effects of the parameters,
V, nf and R. Similar to the previous simulation, typical levels of the parameters are
set to include a wide range of possible values of the parameters while obtaining
analyzable time series. 2500 simulations are done and the R estimate is obtained in
each simulation. The parameter settings are shown in Section 2.4.1. Mean square
error (MSE) is used to measure the error of forecasting because it incorporates both
the variance of Y and its bias. The ratio between the estimated R and the true-R,
i.e. (r/R), is used to measure the deviation of estimated R from true R, i.e., r/R = 1
R’s.
Figure 2.8 shows that higher V gives larger mean and variance of the error.
With the same signal-to-noise ratio, if the variance of the time series, V, is larger, the
evolution error of the time series will also be larger. DLM is to model the mean of
the time series, in other words, if V is larger, the range of the mean of the time series
of R.
In Figure 2.10 and Figure 2.11, the graphs from left to right are the histograms
32
(a) Mean of R estimates (b) Variance of R estimates
Figure 2.7: An illustration of the definition of Interval and a histogram of the interval.
33
Figure 2.8: Histogram of MSE of V
which are 10, 100 and 0.02 respectively. Figure 2.10 shows that higher nf gives more
precise error but it doesn’t help to reduce error. So, in our simulation study, the error
of forecasting is only determined by the range of the time series, i.e., larger fluctuation
will give larger error. However, increasing the number of data for forecasting, nf , can
The distribution is skewed when nf is small and becomes more symmetric and
precise when nf is increasing. Figure 2.11 shows that the estimation of R is pretty
precise when nf = 500 . So, the forecasting precision is increased when the estimation
precision of R is increased.
34
Figure 2.11: Histogram of r/R of nf
In Figure 2.12 and Figure 2.13, the graphs from left to right are the histograms
of MSE and rR respectively of increasing R.V, nh and nf are set to typical values
which are 10, 100 and 500 respectively. Figure 2.12 and 2.13 show that larger R
gives slightly larger error but better R estimate. It is because that for larger R,
signal becomes clearer in contrast with noise. Therefore, the estimation of R is more
precise.
35
Figure 2.14: Design of experiment and the prediction performance (mean of MSE and
r/R of each set of experiment)
time series are generated by the following parameters: initial mean, variance, signal-
to-noise ratio, number of historical data and the number of data to be forecasted.
In order to study the effects of the parameters and to investigate the prediction
performance of the algorithms under various circumstances, five levels of values for
each parameter are carefully chosen. For each parameter, all of its levels will be tested
36
while other parameters are fixed to typical values so that the effects of each parameter
will be better visualized with more general time series. 1-step forecasting is used so
that the performances are clearly shown and compared by using simple forecasting
case. To compare the proposed method to the conventional method, two ways are
designed to specify the signal-to-noise ratio: use chosen fixed-R and use updating-R.
The prediction follows the trend of the time series even there is mean drift. In Figure
2.15 (upper panel), black line is the observed values and red line is the forecasts. In
the figure, we can see the mean drift from the observed data. Mean drift is defined
as random evolution of hidden mean value. In our model, we assume that the hidden
mean follows normal distribution with mean zero and variance wt (see eq. 2.25).
Therefore, the model performs well on time series data even with mean drift.
Figure 2.15 (lower panel) shows R estimate of one simulation with parameter
V = 10, nh = 100, nf = 700 and R = 0.02. Again, we see that R estimate converges
37
to the true value asymptotically. It means that the proposed method can accurately
estimate the truth R and the model re-constructs the time series and predicts future
design again. Five levels are carefully chosen to represent various circumstances.
Applying the forecasting models, the mean square errors (MSE) of each method are
calculated and plotted in the same graph. The purpose of this simulation is to see
how well the updating-R method is over time, versus the fixed-R method.
From the graph, the performance over time is very clearly visualized. The red
line in Figure 2.16 shows the MSE of the model by using R estimate. The true R is
0.02 which is represented by the green line. The MSE of R estimate approaches to
that of the true R. If we set R to 0.05, the performance is slightly worse. If we set
R to 0.001, the performance is much worse. That means if we select a wrong R, the
38
performance can be significantly affected. And, our method can help to solve this
issue.
The results in Section 2.4.3 address the concerns stated in Section 2.4.2. DLM
with updating-R procedure always performs better than fixed-R method. The pro-
posed R estimator is unbiased and converges to the true value when sample size is
large. DLM with updating-R is more robust than conventional DLM method in the
case of mean drift exists in the time series. In general, more forecasting data (nf )
gives smaller prediction variance and more accurate R estimate, larger observation
variance gives larger mean and variance of the error, and larger R gives slightly larger
Six real datasets are used in this study to compare the two methods described
in Section 2.2 and validate the effectiveness of the proposed method in Section 3.
These data are shown in Figure 2.17. Data 1 to Data 3 are counts of cargo loads in
railroad industries and Data 4 to Data 6 are counts of received calls in a nurse triage
call center. It is noticeable that Data 1 to Data 4 are stationary and cyclical with
about 7 days per cycle, while Data 5 and Data 6 are non-cyclical but may have mean
drifts.
Figure 2.18 shows the time series of Data 1 to Data 6 and their forecasting
results. Data 1 to Data 4 are de-seasonalized before input to the model. So they show
performing well on their cyclical properties in Figure 2.18. Data 5 and Data 6 do not
show any clear seasonal pattern so there is no straightforward way to eliminate their
the forecasts of all datasets follow the trends of the time series. To tell the difference
To build ARIMA models for the time series, we need to specify a model first.
To do this, we need to look at the autocorrelation function (ACF) and partial auto-
correlation function (PACF) plots. The model building process of Data 1 is given in
From both plots, it is obvious that the cycle is 7 days. So we specify the
seasonality as 7. After specifying the seasonality, the ACF and PACF are plotted
again as follows. Now, we can see from ACF plot that there is autocorrelation at
lag 7. Therefore, we specify SMA as 1. Looking at PACF plot, it shows that there
is autocorrelation at lag 7 and 14, and maybe more but it’s not important because
40
Figure 2.18: Forecasting results of the six datasets
41
practically we don’t specify an order higher than 2. For simplicity, we would specify
SMA as 1. Again, ACF and PACF plots are plotted for the new model.
From both ACF and PACF plots, we can see autocorrelation at lag 1. Therefore,
MA and AR terms are both specified as 1. By plotting again the ACF and PACF
plots, we see the current model (1, 0, 1)(1, 1, 1)7 is much better.
always give us the best model. Usually, we would try several more models to make sure
the best one is chosen. In this case, we would suggest users be aware of overfitting the
model. Therefore, we would try (0, 0, 1)(1, 1, 1)7 , (1, 0, 0)(1, 1, 1)7 and (0, 0, 0)(1, 1, 1)7 .
To build a DLM, we just need to follow the method in Section 2.3. Figure 2.3
shows the comparison of the results of the proposed method and the four ARIMA
models of the cases. Solid blue line represents the proposed method and colored dotted
42
Figure 2.21: ACF and PACF plots of model (0, 0, 0)(1, 1, 1)7 for case 1
Figure 2.22: ACF and PACF plots of model (1, 0, 1)(1, 1, 1)7 for case 1
43
lines represent ARIMA models with different model specification. From Figure 2.23,
it is clear that DLM performs better in case 5 and 6 than case 1 to 4. The reason is
probably that case 5 and 6 have mean drift which can be modeled by DLM but not
by ARIMA.
Figure 2.23: MSE of the two methods: DLM (dark blue) vs. ARIMA for 6 cases from
left to right and then up to down
Figure 2.24 shows the R estimate of the six cases. In case 1 to 4, the R estimate
approaches to zero which means the signal-to-noise ratio is very low. Case 5 and 6
shows a much larger R estimate. Recalling Figure 2.11 and 2.12, higher R values give
slightly larger error but if is much smaller than 0.1, the estimation will be biased and
have large variance. So, case 5 and 6 may give slightly larger error just due to having
large R. And, case 1 to 4 may inaccurately estimate their R’s. Therefore, we should
keep in mind these properties when we analyze Figure 2.24 and make any conclusion.
44
Figure 2.24: R estimate of the 6 cases from left to right and then. We see that except
case 5, the R estimate approaches to a stable value when there are more observations.
From Figure 2.25, Updating-R procedure is always better than Fixed-R method.
So, the proposed updating-R procedure is not only convenient which systematically
method.
to the random patterns in the data. Many methods have been developed to solve
this problem and successfully applied in industries. However, there exist many issues
with these methods. For instance, the hidden mean of the time series in service indus-
tries may vary from time to time but ARIMA is not able to model this phenomenon.
Moreover, ARIMA needs a large amount of historical data to model a time series, but
in some cases this is not satisfied. Also, model specification sometimes imposes much
45
Figure 2.25: Updating-R vs. Fixed-R (R = [0.001 0.01 0.02 0.05 0.08])
will be misleading. Therefore, people have developed DLM as alternative tools for
time series forecasting which is a special type of State Space methods. However, to
apply DLM, the signal-to-noise-ratio R has to be specified. Since the true value of R
is generally not available, the only way is to guess a value which is inconvenient and
cally in the forecasting procedure. The properties of the proposed R estimator and
the new forecasting procedure with this estimator are studied by simulations. A case
study is also done where the proposed method is compared with the ARIMA method
using six datasets from service industries. It is found that this method outperforms
There are some open issues in this study which will be considered in my future
46
1. The variation of signal-to-noise ratio: Besides the issue of unknown
practice, it is possible that the signal-to-noise ratio is not a constant but changes
over time. It will be interesting to study the behavior of a changing R and develop a
cope with cyclical feature and seasonality: Form-Free Seasonality Model and Fourier
order and second-order polynomial to model the seasonality. In the method of Fourier
form representation of seasonality, he suggests to break down the time series into many
47
CHAPTER 3
Series
3.1 Introduction
of controlling their own breathing [5]. The respiratory motion is mainly regulated by
the level of the partial pressure of carbon dioxide. Higher level of the partial pressure
of CO2 means more urge to breathe [5]. Besides physical factors, environmental and
psychological factors may also contribute to the variation. So, respiratory motion
noted is that both inter-individual and intra-individual variations are usually signif-
icant. Fortunately, respiratory pattern usually shows some statistical properties and
can be used for the prediction of tumor position. Much efforts has been devoted to
mining these properties and using them for respiratory motion prediction. In consid-
eration of high inter- and intra- individual variation, an adaptive method that works
well for all kinds of patients in all time is desired. The importance of respiratory mo-
scan, all devices which involve tracking tumor position during treatment, suffer from
system latencies [5, 24]. System latency is the required response time of the whole
48
If a tumor is in the thoracic or upper abdominal region, respiratory motion will
be the dominant factor for tumor movement [25]. Without accounting for the res-
piratory motion, critical misalignment between irradiated field and the target tumor
volume in a treatment fraction may occur during radiotherapy, and impose signif-
icant radiation dose to normal body tissues. To account for respiratory motion in
may not be available to all patients. Respiratory gating methods are first adopted in
Japan in the late 1980s [5]. Following its success, the method becomes popular and
Real-time tumor tracking can utilize the total duty cycle without any interruption.
This method requires the least participation of human which may enhance the reli-
ability. Also, time can be saved which means more service can be provided within
a limited time span. To succeed, this method should be able to do four things: (1)
identify the tumor position in real time; (2) anticipate the tumor motion to allow for
time delays in the response of the beam-positioning system; (3) reposition the beam;
and (4) adapt the dosimetry to allow for changing lung volume and critical structure
locations during the breathing cycle. For gating method, special caution should be
taken by the therapist if the breathing pattern is different from the simulation. This
problem does not exist in Real-time tracking method as long as the system can do
the aforementioned four things. In this chapter, we discuss in detail of the second
49
Table 3.1: A list of latancies of different systems.
VERO MLC MAD CyberKnife
Position acquisition 25 309 30 25
Position calculation 2 20 - 15
Gimbals/MLS/robot control cycle 20 52 45 75
Other - 38 100 -
total 47 420 175 115
task: predicting the tumor motion to compensate the system latency which ranges
The current generation of the CyberKnife has a latency of about 115 ms, down
from 192.5 ms in the previous version, which is still widely in use [26].
Many prediction methods for respiratory motion have been developed for ra-
diation therapy to compensate the system latencies. The following list contains the
Through this study, we have reviewed some latest methods. These includes
ral Networks (NN) [28], Kernel Density Estimates(KDE) [29], Support Vector Re-
gression Prediction - SVRpred [16, 24], Recursive Least Squares (RLS) [24], The
Ernst [24] did a survey in 2013 on some of these methods. He concluded that
Wavelet-based Multiscale Autoregression (wLMS) [15, 30] has the best per-
(SVRpred) [16] which is developed based on Accurate Online Support Vector Re-
gression proposed by Renaud [30], performs better in longer term prediction. Support
50
vector regression (SVR) has been widely applied on respiratory time series prediction
[32, 16, 24, 33, 34, 35, 36]. For all of these current methods of SVR, the coefficients
are trained by either using the whole time series to capture all possible information
or using a sliding window to capture recent development. We will show that through
pattern matching, better prediction can be obtained by only selecting similar patterns
as inputs of SVR. Ichiji [37, 27, 17] proposed resi-TVSAR in 2013 and reported very
good performance of the method. Therefore, we select these two methods to compare
with our proposed method. We have selected TVSAR, wLMS and SVRpred. We
will also compare our proposed method to Seasonal ARIMA which is a very popular
classic method.
this problem. ARIMA which is a very popular method, provides a general framework
which can model linear and stationary time series or homogeneous non-stationary
time series. Seasonal ARIMA (SARIMA) is developed to further cope with con-
modified SARIMA [37, 14] in 2009. The method converts the time-varying periodic
component to a constant periodic one by adjusting the time variation. However, be-
takes the varying cycles into account. The following is the detail of the method.
51
The N th SAR model of time series y(t), t = 1, 2, . . . is given as follows [37, 27,
17]:
N
X
y(t) = (t) + Φn · y(t − n · s) (3.1)
n=1
period of the target time series y(t), and (t) ∼ N (0, σ 2 ) is a Gaussian noise.
Then, the SAR model-based equation for h-sample ahead prediction is given by
putting t as t + h:
N
X
ŷ(t + h|t) = Φ̂n · y(t + h − n · s) (3.2)
n=1
We can note that this assumes a constant prediction horizon h for all time-varying
intervals.
The prediction equation of the N th TVSAR model for prediction horizon h is given
as:
N
X
ŷ(t + h|t) = Φ̂n · y(t + h − r̂n (t + h|t)) (3.4)
n=1
where r̂n (t|t) > 0 are called reference intervals for indicating the past observed values
at a corresponding phase to the current value y(t). The reference intervals are the
key part of TVSAR. By calculating the correlation to the past data to do a pattern
matching, reference intervals is found. In the other words, the reference intervals are
the points in the past which are at the same phase as the current value, i.e. y(t). An
SAR model is then built by using the points which are in the past few cycles and are
correlation between the past data and the current window. The estimation procedure
is as follows:
where t and t are the sample mean and variance of a subset time series with length w
described as [y(t − w + 1), y(t − w + 2), . . . , y(t)]. Figure 3.1 illustrates the estimation
procedure [17, 37]. The nth reference interval is estimated by finding the lag k
which obtains nth local maximum the correlation function C(t, k). The k is described
as:
53
where the search range is set as half of w around the reference intervals found at time
to adapt the length based on r̂1 (t|t). The initial reference intervals used for the
1. It does not take the baseline shift and amplitude change into account.
2. It is hard to maintain an effective window size which makes the search of refer-
3. It assumes a fixed prediction horizon, h, for all reference intervals which obvi-
and organs, respiratory motion time series is actually a record of activity of the chest
definite time horizon. Simply speaking, respiratory motion time series is a mixture of
motion with 3 levels. Each band has its own pattern. By using wavelet decomposition,
54
Figure 3.2: Wavelet decomposition of a respiratory motion time series
for prediction. He provided the following close form equations of the coefficients of
1
c0,n = yn , cj+1,n = (cj,n−2j + cj,n ), Wj+1,n = cj,n − cj+1,n
2
smoothed signal, cJ by passing low-pass and high-pass filters with particular ranges
of frequencies, i.e. yn = W1,n + · · · + WJ,n + cJ,n . Also, aj and aJ+1 denotes regression
depth of level Wj and the smoothed signal cJ respectively. The multiscale autoregres-
55
sive (MAR) forecasting can be done by building up an autoregressive (AR) prediction
model for each wavelet scale and then sum up all of the prediction:
J
X
M AR
ŷn+k = wjT Ŵn,j + wJ+1
T
ĉn (3.8)
j=1
weights of each AR model, wj , are learnt adaptively by least mean squares prediction
B = (ln−k , . . . , ln−k−M +1 )T ,
T T
lt = (Ŵt,1 , . . . , Ŵt,J , ĉTt )T (3.11)
w = (w1T , . . . , wJ+1
T
)T , sn = (yn , . . . , yn−M +1 )
To cope with the regularity of the normal equation used to solve for w, Ernst
56
an exponential averaging parameter µ to include possible missing information which
is not included in the current signal window. Finally, wLMS is defined as follows:
J
X
M AR T T
ŷn+k = wn,j Ŵn,j + wJ+1 ĉ (3.12)
j=1
T
ŵn = (wn,1 , . . . , wn,J+1 ) (3.13)
Note that wLMS only uses the latest data to build a model. It performs very well on
very short term prediction but its medium to long term prediction ability is unsatis-
factory.
ity with variation on mean position, phase and frequency. The occurrence of these
changes is due to complex causes and can be considered as random which means ,for a
pattern, a future value has an expected value with a variance. Even though the future
value of individual pattern is random, a collection of similar patterns will give very
accurate prediction. Our study of tumor position prediction shows that the average
of these responses provides a very accurate and effective prediction to the respiratory
motion.
similar patterns from the past record and exploit the information of those patterns
for prediction of tumor position. Figure 3.4 shows the general approach of the pattern
based online prediction framework. Instead of only using recent cycles or using the
whole time series to train a model like most people do, an effective and accurate way
57
is to look for similar patterns from the past record and analyze the information of
Figure 3.4 shows scatter plots and plots of autocorrelation function of height of
cycle of a respiratory motion time series versus their 1st lags. It shows that the height
pattern matching to make prediction because similar patterns should have similar
response stochastically.
threshold and a cutoff value (k). The general approach is shown in Figure 3.4. Be-
fore starting prediction, the ratio of window size to cycle length (R) is needed to be
58
Figure 3.5: Three best neighbors (solid black lines) of the current segment (solid blue
line), the dotted lines are their ”future” values
determined by training and validation. In fact, the first step itself in the flowchart is
Through validation, a parameter set that gives the most accurate result is se-
lected. After obtaining the optimal window ratio, a pattern library is built by using
the selected R. Then, we determine the best matching patterns from the pattern
predictive patterns for prediction is decided by statistical and feature analysis. The
previous step roughly provides us generally most matching patterns. This step is to
further refine the set of best neighbors(BNs) in order to significantly enhance the
prediction performance. After obtaining the BNs, we then can use their information
to make prediction by either simply taking the average of the ”future” values of the
will be discussed in result section. Figure 3.5 illustrates a general approach. Three
patterns similar to the current segment at time t are found. The solid lines represent
59
Scatter plot of Height vs Height lag−1 Sample Partial Autocorrelation Function of Height
0.6
0.3
0.2
0.2
0
0.1
0 −0.2
−0.2 0 0.2 0 2 4 6 8 10 12 14 16 18 20
Lag
Scatter plot of Interval vs Interval lag−1 Sample Partial Autocorrelation Function of Interval
160 0.6
Sample Partial Autocorrelations
140
0.4
120
100
0.2
80
60
0
40
20 −0.2
−100 −50 0 50 100 0 2 4 6 8 10 12 14 16 18 20
Lag
Figure 3.6: Scatter Plots (left) and Autocorrelation Function (right) of the height
and the interval of respiratory motion versus its 1st lag
the current segment and their BNs. The dotted lines represent the future values of
In our study, the respiratory motion data of 27 patients are studied. The length
of the data ranges from about 30 minutes to 60 minutes. The first 60% of the total
data is used to build up a pattern library; the next 20%, but at most 6,000 points,
of the data is used for validation and the remaining for testing. The median of the
60
Figure 3.7a, is used as a baseline for window sizes of two-window design. Median is
used because of the skewness of the distribution of the interval as shown in Figure
3.7b.
Figure 3.7: An illustration of the definition of Interval and a histogram of the interval.
To reasonably determine the final window size, we find the optimal ratio of
window size to the median of the peak-to-peak intervals, R, to control the window
size. Median is used because of the skewness of the distribution of the intervals. The
R is determined by validation. The window sizes are then multiplied by the selected
ratio, L = Lj × R for j = 1, 2, where L1 and L2 are the windows size for smaller and
larger windows respectively. Pattern libraries of all window size Bn×Lj are built up
where n is the number of time series segments in the library and Lj is the size of j th
window. In order study, the ratio R for smaller and larger windows are set to be 0.5
In validation process, one window ratio is picked each time. R-square is used
61
Pt=n 2
t=1 (ŷ(t, R) − ȳ)
R̂ = arg max 1 − Pt=n 2
(3.16)
R t=1 (y(t) − ȳ)
0.97
0.9992 0.99
0.965
0.78
0.76
0.5 1 1.5
Ratio of window size to its median (R)
Figure 3.8: The prediction accuracy for various window length with window
ratio to median interval (R) ranging from 0.3 to 1.5 for prediction horizon
h=1,5,10,15,20,25,30 for patient 16
From Figure 3.8, we find out that, for patient 16, if only one window is used
for prediction, shorter window size is better than longer window size. For longer
prediction horizon, we observe there are two local maximum at R=0.5 and 1.2.
The 3-D plot of the prediction accuracy for various Rs shows that the prediction
accuracy mostly depends on the prediction horizon but we can observe that the effect
62
Prediction performance for various window size of Patient 16
1
Prediction accuracy (R2)
0.95
0.9
0.85
0.8
0.75
0 0
10 0.5
20 1
30 1.5
Ratio to median interval
Prediction horizon (h)
Figure 3.9: A 3-D plot of the prediction accuracy for various window length with
window ratio to median interval (R) ranging from 0.3 to 1.5 for prediction horizon
h=1,5,10,15,20,25,30 for patient 16
of window ratio to median interval (R) becomes more significant when the prediction
The window ratio R that maximizes the R-square is selected for prediction.
Table 3.3 presents the results of using adaptive ratio. It shows that with adaptive
However, for now, only 3 ratios - 0.75, 1, 1.25 - are considered in the experiments
and one ratio is used for all windows in each time. In the future, we will study how
63
to optimize the window ratio for each window in order to obtain optimal result from
Patterns Figure 3.10 illustrates the phase I of VBN approach. In phase I, the
best matched patterns are discovered by searching from the pattern libraries of two
Using the current segment to look for patterns that their similarity measures,
dates must be removed because we are only interested in their patterns. The regular
VBN patterns have the problem of not considering signal shifting. Thus, it leads to
64
inaccurate prediction due to the shifted errors as shown in left graphs of Figure 3.11
and 3.12. To achieve accurate prediction, we propose to align the patterns at their
rightmost point during VBN searching process. As shown in Figure 3.11 and 3.12,
the right alignment puts higher weights on the right end and helps to obtain best
neighbors that have better match on the right end which we have found that is more
where un is the segment of nth candidates in the pattern library and u0 is the current
line segments.
Pi=Lw
(un (i) − ū0 )2
Sn = 1 − Pi=1
i=Lw
(3.18)
i=1 (u0 (i) − ū0 )2
Figure 3.11 and 3.12 show the close views of typical examples of best neighbors
found by comparing raw patterns and right aligned patterns with the current segment.
First of all, From Figure 3.11 and 3.12 we see that using different alignment gives us
different best neighbors. Even though the best neighbors found by raw patterns may
show high similarity in overall but the right adjusted best neighbors show a better
matching at the right side which is the closest point to the point that is going to
be predicted. From Figure 3.11 and 3.12, we see that right aligned BNs give better
prediction result.
iteratively obtain the BNs from the top of the list until at least k BNs are obtained
and the next S1 is smaller than the threshold, θ. One important thing has to be
65
Figure 3.11: Online prediction of a patient’s respiratory data by using unaligned
BNs(Left) and right-aligned BNs(right). Belows are the best neighbors marked with
vertical lines in the time series.
Figure 3.12: A zoom-in view of Figure 3.11, using unaligned BNs(Left) and right-
aligned BNs(right). We can see that right-aligned BNs is obviously better than un-
aligned BNs.
66
done is to remove the candidates which are adjacent to the selected BNs in order to
where Bk denotes the library at the time of after entering the kth best neighbor; tλk
denotes the time at the end of k th best neighbor λ; and m denotes a small distance that
the candidates within this range are excluded. In the respiratory motion prediction
study, we choose the distance as one-fifth of the median of peak interval, i.e. b =
A Larger Window In phase I, by using a short window, the best neighbors are
the patterns best matched in short term range. That is an important step to attain a
high accuracy in short term prediction. The better the matching in the most recent
data is, the closer the future trend of the best neighbors will follow. A short window
The next step is to consider the pattern matching in a longer horizon of the best
neighbors obtained from the previous step in order to guarantee a correct matching
in the phase of respiratory motion. In other words, we need to look closer to have a
clear picture of what is short term trend in the current moment, then we look at a
bigger picture to figure out at what phase of a cycle the respiratory motion is.
The window sizes of the shorter and the longer windows are two parameters
values in validation process. In validation, the shorter and the longer windows are
initially set to be 0.5 and 1.5 of the median cycle length. Then, the window sizes are
then multiplied by several ratio, R. The best ratio will be selected for the patients
individually.
67
In this phase II, to search for the best neighbors from the best neighbors ob-
tained in phase I, we repeat the process of phase I. Just that, in this time, the window
After finalizing the set of BNs, prediction is done by using the ”future” infor-
mation of the BNs. The simplest and effective method is taking an average of their
”future” values as equation 3.25. For some cases, when the best neighbors do not
match very well, we may consider support vector regression (SVR) and we name this
Phase III: Best Neighbors Removal Using Statistical Analysis Figure 3.13
shows the scatter plots of the errors of short segments before and after tλk of the first
eight best neighbors of patient 23. Figure 3.15 is a close view of the best neighbors at
around tλk . The blue solid line is the current segment. This example shows that higher
error at the left side implies higher error at the right side. The correlation is obvious
at this example. Figure 3.14 is a drawing that clearly illustrates the phenomena.
The error of the mismatching of a short segment with length l just before and
after time t are significantly correlated, where t is the current time point of the current
segment and the corresponding time points of the candidates. To further refine the
set of the candidates, we suggest to remove those candidates with bigger mismatching
error of a few points just before time t. The sum of square errors of matching of a
68
Figure 3.13: Scatter plots of the error before tλ vs the error after tλ . Correlation
between the errors is observed.
Figure 3.14: An illustration of the error of the best neighbors before and after time
tλk . If the error at the left hand side is large, then error at the right hand side is also
likely to be large.
Next, since the distribution of error are skewed and the skewness varies among indi-
viduals, we remove the candidates with error larger than median plus one and a half
outlier among other candidates as shown in Figure 3.16. So, the last step of Phase
Figure 3.17 show two example of best neighbors without outliers. These examples
the best neighbors, the expected prediction values of h samples ahead is assumed to
be similar.
70
Figure 3.16: An example of an outlier in the best neighbors
where K denotes the number of the similar patterns, y(tk + h) denotes the value at
(t + h) denotes the error of predicting the value at time t + h and Θk denotes the
coefficient of the referenced value of k th BN. (t + h) includes the random error and
71
Taking the average of the referenced values, i.e. the samples which are h samples
ahead of all BNs, for prediction, the proposed model equation can be written as:
K
X
ŷ(t + h) = Θ̂k · ŷ(tk + h) (3.25)
k=1
1
We set Θk = K
to use the mean of the future values of the referenced patterns
for prediction.
bagging, is an appropriate way to control and check the stability of the results, and
is asymptotically more accurate than the standard intervals obtained using sample
variance and assumptions of normality. By careful choice of the size of the resamples,
bagging can lead to substantial improvements of the performance of the kNN method.
Adr et al.recommend the bootstrap procedure for situations when the theoretical dis-
the sample size is too small to be sufficient for straightforward statistical inference.
Then, bootstrapping may help to control and check the stability of the results by
M N
X 1X
ŷ(t + h) = ( y(tmn + h)) (3.26)
m=1
n n=1
may just directly use simple average and standard interval. In this case, bootstrapping
72
average and confidence interval are asymptotically consistent to simple average and
Smirnov test to check for the normality of the referenced values of the nearest neigh-
bors. The null hypothesis states that the population is normally distributed.
horizon h = 15. Zero means fail-to-reject null hypothesis while one means rejecting
null hypothesis.
0.8
0.6
0.4
0.2
−0.2
4.4 4.6 4.8 5 5.2 5.4
time x 10
4
73
Prediction Using Support Vector Regression Best neighbors can only be very
similar to but rarely exactly identical to the current segment. Support vector regres-
sion (SVR) provides a way to fill in the gap of the similarity of the current segment
amongst the best neighbors. In general, SVR is able to enhance the prediction slightly
advantages of support vector regression are the nonlinearity of regression line, the
ability of handling huge dimensions and its robustness to outliers. Due to these
Figure 3.19 illustrates a simple example of SVR. The middle line is the regression
line and the upper and lower lines are the lines passing through support vectors. The
gives finer regression line. And, the slack variable, ξ, is to allow excluding outliers. A
By choosing a kernel function, Φ, and using the obtained best neighbors to train
for the following SVR function. The weights, w, can be obtained by optimization
algorithms.
y(t + h) = wT µt (3.27)
ξi , ξi∗ ≥ 0, i = 1, . . . , L.
74
Note that the equations satisfy KKT conditions. We can introduce Lagrange multi-
From the saddle point condition, the partial derivatives of L with respect to the primal
Substituting equations (3.30) into equation(3.29) yields the dual optimization prob-
lem:
− 1 Pl (αi − α∗ )(aj − a∗ )hxi , xj i
2 j=1 i j
maximize Pl (3.31)
P l
− j=1 (αi + αi∗ ) + j=1 yi (αi − αi∗ )
Pl
subject to j=1 (αi − αi∗ ) = 0 and αi , αi∗ ∈ [0, C]
By solving equation 3.31, we obtain the regression function (equation 3.27) and
input the current segment and the best neighbors into that function to do prediction.
75
Figure 3.19: Illustration of support vector regression with insensitive parameter and
slack variable ξ.
3.3.4 Comparison for the Prediction Performance of RPKM and Some State-Of-
The-Art Methods
The followings are the comparisons for the prediction performance of RPKM
sion (wLMS) and Support Vector Regression prediction (SVRpred) are con-
cluded as the best methods by a survey conducted by Ernst [15, 30, 16, 24] in 2013.
TVSAR is a method developed by Ichiji [27] also published in 2013. They all
claim these methods are the best. In addition, Seasonal ARIMA is also added to the
placement of 27 lung and liver cancer patients were collected with the Real-time
Position ManagementTM (RPM)(Varian Inc., Santa Clara, CA) infrared camera and
reflective marker block system during their PET/CT examination. The time series
76
The use of data was approved by the appropriate Institutional Review Board
in compliance with the Health Information Privacy and Portability Act. [38]
The sampling rate of respiratory traces was 30 Hz. The duration of data collec-
In the data, 60% of it is used for training; 20% is used for testing and the
For TVSAR and wLMS, they do not need training. Prediction directly started
For the experiment of PKRM and PKRS, the threshold for obtaining the best
For RPKS and SVRpred, we consider −212 , −211 , . . . , 212 for kernel parameter,
γ, and 0, 0.01, 0.02, . . . , 0.1 for insensitive zone, , and max(|ȳ + 3σy |, |ȳ − 3σy ) for
regularization parameter, C.
methods Table 4.2 shows the prediction performances of RPKM, res TVSAR,
wLMS, SVRpred and SARIMA. And, Figure 3.20 shows the box plots of the predic-
tion performance of the proposed methods and the current state-of-the-art methods.
Even though we only consider 3 ratios and the ratio is fixed for both windows, we
still can see a little improvement by using adaptive windows. By considering opti-
mizing the window size for individual patient, we expect that the performance would
be further improved.
Among the state-of-the-art methods, wLMS performs very well in short term
prediction and res TVSAR outperforms wLMS for long term prediction. Except
77
Finally, for RPKM and RPKS, it is very obvious that they outperforms all
other methods significantly. Also, based on the results, RPKS is slightly better than
RPKM.
Table 3.2: The prediction performance metrics, mean and standard deviation of R-
squares, of the proposed methods and the state-of-the-art of respiratory motion pre-
diction methods on 27 patients
Prediction horizon 1 5 10 15 20 25 30
RPKM mean 0.998 0.976 0.918 0.831 0.728 0.620 0.523
std 0.001 0.018 0.052 0.095 0.141 0.179 0.206
RPKS mean 0.998 0.978 0.920 0.836 0.732 0.624 0.523
std 0.002 0.018 0.053 0.093 0.132 0.167 0.196
res TVSAR mean 0.964 0.834 0.684 0.462 0.229 0.013 -0.146
std 0.088 0.378 0.393 0.436 0.487 0.462 0.454
wLMS mean 0.996 0.880 0.648 0.386 0.131 -0.083 -0.233
std 0.005 0.322 0.487 0.527 0.535 0.526 0.520
SVRpred mean 0.908 0.738 0.639 0.347 0.029 -0.154 -0.323
std 0.044 0.075 0.099 0.164 0.324 0.323 0.359
SARIMA mean 0.979 0.846 0.608 0.231 -0.053 -0.292 -0.414
std 0.019 0.127 0.281 0.469 0.466 0.475 0.479
78
Prediction performance with prediction horizon h = 1 steps Prediction performance with prediction horizon h = 5 steps
1 1
0.998 0.95
0.996 0.9
0.994
0.85
0.992
0.8
0.99
RPKM RPKS res_TVSAR wLMS SVRpred SARIMA RPKM RPKS res_TVSAR wLMS SVRpred SARIMA
(a) A close view of Prediction horizon h=1 (b) Prediction horizon h=5
Prediction performance with prediction horizon h = 10 steps Prediction performance with prediction horizon h = 15 steps
1 1
0.9
0.9
0.8
0.8 0.7
0.6
0.7
0.5
0.6 0.4
0.3
0.5
RPKM RPKS res_TVSAR wLMS SVRpred SARIMA RPKM RPKS res_TVSAR wLMS SVRpred SARIMA
0.8 0.8
0.6
0.6
0.4
0.4
0.2
0.2 0
−0.2
0
RPKM RPKS res_TVSAR wLMS SVRpred SARIMA RPKM RPKS res_TVSAR wLMS SVRpred SARIMA
0.5
−0.5
RPKM RPKS res_TVSAR wLMS SVRpred SARIMA
Figure 3.20: Prediction performance of RPKM, RPKS and the state-of-the-art meth-
ods for prediction horizons h=20 to h=30
79
Prediction Performance of RPKM With and Without Adaptive Ratio Ta-
ble 3.3 shows the prediction performances of RPKM and RPKM with adaptive ratio.
And, Figure 3.21 shows the box plots of the prediction performance. Based on the
Table 3.3: The prediction performance metrics, mean and standard deviation of R-
squares, of the proposed approaches with and without adaptive ratio on 27 patients
Prediction horizon 1 5 10 15 20 25 30
RPKM mean 0.998 0.976 0.918 0.831 0.728 0.620 0.523
std 0.001 0.018 0.052 0.095 0.141 0.179 0.206
RPKM(without adaptive ratio) mean 0.998 0.976 0.916 0.827 0.721 0.612 0.517
std 0.001 0.019 0.054 0.096 0.142 0.180 0.206
Directly using raw data for pattern matching works as long as the signal is clean
with little noise. However, the quality of medical devices varies from one to another.
Some systems may have more noise than others. So, it is desirable to find a robust
method which is able to cope with data with more noise in order to have consistent
performance.
Besides, using raw data to build up the pattern libraries consumes a lot of space.
The higher the sampling rate is, the finer the signal can be obtained, however, which
also means a larger size of the library. Sparseness is a very popular topic in data
mining. Using reduced signal to represent the original signal can usually speed up
based variant best neighbors time series prediction is named as OPPRED which
80
Prediction performance with prediction horizon h = 1 steps Prediction performance with prediction horizon h = 5 steps
1 1
0.998 0.95
0.996 0.9
0.994
0.85
0.992
0.8
0.99
RPKM(without adaptive ratio) RPKM RPKM(without adaptive ratio) RPKM
0.9
0.9
0.8
0.8 0.7
0.6
0.7
0.5
0.6 0.4
0.3
0.5
RPKM(without adaptive ratio) RPKM RPKM(without adaptive ratio) RPKM
0.8 0.8
0.6
0.6
0.4
0.4
0.2
0.2 0
−0.2
0
RPKM(without adaptive ratio) RPKM RPKM(without adaptive ratio) RPKM
0.5
−0.5
RPKM(without adaptive ratio) RPKM
81
follows the same structure of RPKM as shown in Figure 3.22 except that the data is
originally intended for time series segmentation but it shows nice properties on the
0, . . . , K). Note that we do not claim that f (x) is a linear function in x. More
82
concretely, we assume that f is a linear combination of K + 1 (linear or nonlinear)
We may write the values of the K + 1 basis functions for the N + 1 points in
If we combine the N + 1 samples of the overall time series into a vector y with
with k · k being the euclidean norm. Its solution wLS can be found by setting the
with h·|·i being the standard inner product in a real-valued vector space. Then,
∂kFw − yk2
= 2FT Fw − 2FT y (3.36)
∂w
83
provided that the matrix FT F is regular. Then, we have the pseudo-inverse of F, i.e.
wLS = F+ y (3.38)
two properties (two of the four so-called Penrose criteria). First, (AA+ )T = AA+ .
Second, AA+ A = A and, consequently, AA+ AA+ = AA+ . Thus, the residuum
= yT y − wTLS FT FwLS
where FT y = (FT F)F+ y = (FT F)wLS and (FF+ )T FF+ = FF+ With the term
(average) squared error, we refer to the residuum divided by the number of observed
samples:
1
2
σLS = (yT y − wTLS FT FwLS ) (3.40)
N +1
With 3.40, we could now determine the squared error once we have gotten the least-
Now, assume that the selected K +1 basis functions are orthogonal with respect
functions fk1 and fk2 with k1 6= k2 . This is the case for special kinds of polynomials
84
(see Section 3.2), for wavelet families, or the sinusoidal functions used for discrete
That is, FT F is a diagonal matrix which can be inverted if the elements in the
diangonal, which are the squared norms of the basis functions, are nonzero. This can
get
That is, the least-squares solution can be written as a linear combination of the
training samples (cf. the dual representations of classifiers which are common in the
85
With this result for wLS , with Equation 3.40, and with the definition wk =
PN yn
n=0 kfk k2 fk (xn ) for k = 0, . . . , K (elements of the solution vector wLS ), the squared
2
error σLS now becomes
N K
2 1 X
2
X
σLS = ( y − w2 kfk k2 ) (3.43)
N + 1 n=0 n k=0 k
vector space P(R, R) of real polynomials on R fulfill the following three-term recur-
rence relation:
p0 (x) = 1, (3.45)
cated at [0, 1, . . . , L]. In our study, we use Legendre orthogonal polynomials, as shown
L
ak = , (3.47)
2
k 2 ((L + 1)2 + k 2 )
bk = (3.48)
4(4k 2 − 1)
This provides a fast update procedure for generating the pattern libraries.
Due to the orthogonality between OPs, the coefficients of OPs are independent
to each other. Referring to equation 3.42, we can observe that the coefficients of the
corresponding OPs do not change even the order goes up. If the order goes down, the
86
Figure 3.23: Legendre Polynomials
coefficients of the OPs which have orders higher than the basis function, will equal
to zeros.
In the other words, only one approximation has to be done to obtain the approx-
imations equaled to or below that order. For instance, if we obtain the approximation
of order 20, we also obtain the approximations of order 19, 18, and so on.
mation because higher order of approximation does not necessarily give us the best
approximation. One example is shown in figure 3.24. Its coefficients are shown in
table 3.4.
87
Figure 3.24: An example of OPs approximation such that the approximation of lower
order (order 18) is better than higher order (order 20)
follows:
Efficiently determining the best order of OP Higher order does not necessar-
ing done only one approximation provides us all approximations of lower orders.
This unparalleled property efficiently determine the best order. In contrast, for
Sparse data representation Currently, the sampling rate of our data is 30 Hz and
a typical respiratory cycle can take 6 minutes, so there can be 180 samples in
one cycle. Using Orthogonal Approximation with order 20, only 21 coefficients
der 20, we are usually able to obtain very good approximation with R-squares
Fast updating and reconstruction When the window length, N , is fixed, the or-
thogonal polynomials are fixed and can be easily calculated by using the three-
88
term recurrence relation shown in equation 3.47. Due to the orthogonal prop-
erties of the OPs, we obtain a close form equation for the coefficients shown in
equation 3.42.
Signal smoothing Least squares error is used for approximation. Usually the model
cannot model the time series perfectly. Tradeoffs have to be made in approxi-
classification.
SSE
Sn = 1 − (3.49)
SStot
where cjk for j, k ∈ 1, . . . , K which are constants when the orthogonal polynomials,
fk , and n are fixed. Therefore, the similarity metric, Sn (w), only depends on the
SSE(w)
Sn (w) = 1 − (3.52)
SStot
89
3.4.2 Prediction Results of RPKM and OPPRED
Table 3.5 shows that the performance of RPKM and OPPRED are very close
From Figure 3.25b to Figure 3.25e, RPKM and OPPRED significantly out-
perform all of other methods in all prediction horizons. Except SVRpred, Seasonal
ARIMA does not do well comparing to other methods which are dedicated to the
90
Prediction performance with prediction horizon h = 1 steps Prediction performance with prediction horizon h = 1 steps
1 1
0.998
0.95
0.996
0.994
0.9
0.992
0.85 0.99
RPKM OPPRED res_TVSAR wLMS SVRpred SARIMA RPKM OPPRED res_TVSAR wLMS SVRpred SARIMA
(a) Prediction horizon h=1 (b) A close view for prediction horizon h=1
Prediction performance with prediction horizon h = 5 steps Prediction performance with prediction horizon h = 10 steps
1 1
0.9
0.95
0.8
0.9
0.7
0.85
0.6
0.8
0.5
RPKM OPPRED res_TVSAR wLMS SVRpred SARIMA RPKM OPPRED res_TVSAR wLMS SVRpred SARIMA
0.9
0.8
0.8
0.7 0.6
0.6
0.4
0.5
0.4 0.2
0.3
0
RPKM OPPRED res_TVSAR wLMS SVRpred SARIMA RPKM OPPRED res_TVSAR wLMS SVRpred SARIMA
0.8
0.6 0.5
0.4
0.2 0
−0.2
−0.5
RPKM OPPRED res_TVSAR wLMS SVRpred SARIMA RPKM OPPRED res_TVSAR wLMS SVRpred SARIMA
Figure 3.25: Prediction performance of RPKM, OPPRED, res TVSAR, wLMS, SVR-
pred and SARIMA
91
Two Examples of Prediction of the Proposed Methods Figure 3.26 and 3.27
shows two prediction examples to visually demonstrate how exactly well the method
does. The solid blue line represents the observation, red line represents OPPRED
op_mea
n
r
aw_mea
n
92
op_mea
n
r
aw_mea
n
The relative importance of different parts of a time series segment varies. Figure 3.28
In local scale of respiratory motion time series, the latest data is more important
than the older data, recalling the correlation of errors before and after time tλ of the
best neighbors as an example. Referring back to Figure 3.15, we know that the error
close to and before the referenced time is correlated to the error after the referenced
time. As we desire this kind of flexibility in our respiratory motion prediction problem,
93
Figure 3.28: This example shows that even the two time series have the same amount
of error but the occurrences of the errors can be very different. The above plot shows
that the two patterns match very well in the older data (left) but do not match well
in the newest data. Therefore, for prediction, we would prefer the below one.
Many distance functions have been developed throughout the history of time
Definition 3.1 Lp − norm : Given two time series R and S of the same length N,
v
u N
uX
p
Lp − norm(R, S) = t (ri − si )p (3.53)
i=1
compare to a fixed referenced time series which is the current pattern in our method.
94
So, adding a weight to our similarity metric, equation 3.52, can be generalized
as below.
v
u N
uX
p
Lp − norm(R, S, W ) = t wi (ri − si )p (3.54)
i=1
where wi is the weight for the distance of the pair of the ith samples.
a time series are equally important and the regression is to approximate the whole time
series with a minimum overall error. However, flexibility can be given to orthogonal
in the latter data than in the older ones. To achieve this, weights, b, are added to
Then,
∂kFw − yk2
= 2FT bFw − 2FT by (3.56)
∂w
Putting 2FT bFw − 2FT by = 0, we have
95
The coefficients, wLS can then be written as:
PN yn √bn
n=0 kf0 k2 f0 (xn )
..
= .
P √
N yn bn
n=0 kfK k2 fK (xn )
Under the current framework, the best neighbors are found by a 2-step pat-
tern searching. An alternative is to give weights for the two windows and combine
their similarity values with weights. This variant can give a very similar result to the
proposed multiple steps pattern searching method and it is more integrated mathe-
matically.
It is desired to keep the most of the data to be accurate and only give relaxation
on the oldest data. Figure 3.29 demonstrates the weights considered in our study. In
validation process, we can try multiple sets of weights and select the one giving the
best performance.
During pattern matching of the time series, we may want to emphasize the
importance of some part of the time series. Therefore, we propose a weighted time
series pattern matching. Similar work has been done by Jeong [40] who proposed
weighted dynamic time warping (WDTW). Dynamic time warping (DTW) is a kind of
distance measure for time series. Similarly, for weighted time series pattern matching,
weights are added to the time series to penalize the dissimilarity of the different part
96
observations
1 weight(short window)
weight(long window)
0.8
data & weight
0.6
0.4
0.2
0
0 20 40 60 80 100
time
Figure 3.29: The weights of the shorter window (black dotted) and the longer window
(red dotted)
time series prediction, the latest data is intuitively more important than the older
data.
done to validate whether OPPRED is robust to noisy data. In this study, an artificial
generated noise similar to Figure 3.31 are added to the real respiratory motion data
97
of the first 4 patients of 27 patients. One of the settings is to generate short sporadic
noise while another setting is to generate relatively longer and sparse noise as shown
in Figure 3.31.
Table 3.6 shows the mean and standard deviation of the prediction performance
(R-squares) of RPKM and OPPRED on noise-added time series data with prediction
horizon ranging from 1 to 30. It shows that if the time series is noisy, OPPRED
will perform a little bit better than RPKM. In addition to the data sparsity of OP-
PRED, these makes OPPRED a suitable algorithm for respiratory motion time series
prediction.
Table 3.6: The prediction performance metrics, mean and standard deviation of R-
squares, of the proposed approaches on first 4 patients noise-added respiratory motion
time series.
Prediction horizon 1 5 10 15 20 25 30
RPKM(noise) mean 0.975 0.946 0.876 0.783 0.674 0.565 0.479
std 0.010 0.025 0.062 0.104 0.154 0.191 0.216
OPPRED(noise) mean 0.973 0.947 0.879 0.791 0.686 0.578 0.487
std 0.011 0.026 0.061 0.101 0.146 0.180 0.206
during treatment. Accurate respiratory motion prediction can minimize the damage
Pattern matching can effectively utilize the existing information from the data.
Similar pattern demonstrates similar trends in the response. The pattern recognition
98
Prediction performance with prediction horizon h = 1 steps Prediction performance with prediction horizon h = 5 steps
1 1
0.99
0.95
0.98
0.97 0.9
0.96
0.85
0.95
0.94
RPKM(with noise1) OPPRED(with noise1) RPKM(with noise1) OPPRED(with noise1)
0.95
0.9
0.9
0.8
0.85
0.8 0.7
0.75
0.6
0.7
0.65 0.5
RPKM(with noise1) OPPRED(with noise1) RPKM(with noise1) OPPRED(with noise1)
0.9
0.8
0.8
0.7
0.6
0.6
0.5 0.4
0.4
0.2
RPKM(with noise1) OPPRED(with noise1) RPKM(with noise1) OPPRED(with noise1)
0.8
0.6
0.4
0.2
0
RPKM(with noise1) OPPRED(with noise1)
99
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
3.36 3.38 3.4 3.42 3.44 3.46 3.48
4
x 10
Figure 3.31: An example of noise-added time series in the simulation study. Simulated
noise is added to respiratory time series data of a patient
process is enhanced by combining with statistical and feature analysis which help to
obtain better matched patterns and remove undesired patterns. The experimental
results show that the prediction of the proposed method is very accurate and it is
also robust to different kind of patients. For h = 5, most of the patients are able
to attain 0.95 R-squares. This method should contribute greatly to tumor position
prediction in order to help the cancer patients to enhance their life quality.
motion time series are autoregressive. So the values in next cycle may be determined
by previous cycles. This theoretically supports our method which uses similar patterns
robust to noise and drifting than RPKM. Since the proposed method design to do
uality, it shows its potential on other applications that demonstrate the characteristics
100
3.5.1 Future Studies
intended to consider both short and long range pattern of the time series.
and the weighted orthogonal approximations which shows great potential to improve
larity measurement of the time series. It provides a nice control mechanism on the
sensitive to window length and is not perfect. In the future, we may figure out a
window design can be in fact seemed as a special case of n-window design with weight
zeros on other windows. Theoretically, the windows can be infinitively long. How-
ever, obviously, the resulting patterns found by longer windows are less important
than those with the shorter ones. For instance, in respiratory motion, we already
know that the patterns of half cycle are more predictive than the patterns of one
cycle.
Optimizing Parameters Intuitively, not every part of a time series segment has
the same importance on obtaining the most effective approximations or the best
101
Generally, our method has 4 ways to seek for improvement: the first one is
to improve the pattern matching process; the second one is to improve the way of
how we use the obtained best matching patterns for prediction; the third one is to
optimize the parameters of the methods, such as the optimal window sizes of the two
windows and the weights in pattern matching and orthogonal approximation; and the
final one is to optimize the parameters involving in our algorithm, such as window
102
CHAPTER 4
4.1 Introduction
There has been much interest in the beneficial effects of musical training on
cognition. Previous studies have indicated that musical training was related to better
working memory and that these behavioral differences were associated with differ-
ences in neural activity in the brain. However, it was not clear whether musical
nal changes, by using working memory and long-term memory tasks with verbal and
long-term memory tasks. A comprehensive EEG pattern study has been performed,
ysis, power-spectra analysis, and deterministic chaotic theory. The advanced feature
selection approaches have also been employed to select the most discriminative EEG
and brain activation features between musicians and non-musicians. High classifica-
tion accuracy (more than 95%) in memory judgments was achieved using Proximal
Support Vector Machine (PSVM). For working memory, it showed significant differ-
ences between musicians versus non-musicians during the delay period. For long-term
memory, significant differences on EEG patterns between groups were found both in
the pre-stimulus period and the post-stimulus period on recognition. These results
103
indicate that musicians memorial advantage occurs in both working memory and long-
term memory and that the developed computational framework using advanced data
4.2 Methodology
Participants 36 participants were initially recruited for the study. Four partici-
pants were excluded for having negative d values on the long-term memory task, two
were excluded for failing to follow directions, and one was excluded as a behavioral
outlier (more than 3 SDs from mean long-term memory performance). In total, 29
subjects were included in the analyses; 14 (5 female) were professional musicians with
(8 female) with no musical training. Informed consent was obtained from all partic-
a test session involving words and pictures as stimuli. Stimuli were presented visually
on a computer and all responses were made using the keyboard. During the study
session, participants were presented with pairs of stimuli, one at a time. Each study
trial began with a fixation cross (250 ms), the first stimulus (1000 ms), a blank screen
(5000 ms), the second stimulus (2500 ms or until a response), and finally a blank
screen (1000 ms). Upon presentation of the second stimulus, participants made a
judgment of whether the second stimulus was the same as the first (Figure 4.1a).
104
A few minutes following the study session, participants memory was tested.
During this test session, stimuli presented during study were presented again along
with new stimuli that had not been studied. Further, we only tested participants
memory on stimuli that had only been presented once. Therefore, only stimuli pre-
sented on trials that were different during the study session (i.e. trials on which
the second stimulus was different from the first) were presented during test. Each
test trial began with a fixation (250 ms), followed by a stimulus (3000 ms or until
a response), and then a blank screen (1250 ms). Upon presentation of the stimulus,
participants made a memory judgment which included a rating of how confident they
were in their memory (Figure 4.1b). They were allowed to make three responses:
remember with low confidence, remember with low confidence, remember with high
confidence, or new.
Word and picture stimuli were blocked for both study and test phases, such
that each participant was presented with a block of word trials followed by a block of
picture trials (or vice versa). Whether or not participants were presented with words
Types of Stimuli Participants were presented with pictures of complex scenes and
words. During the study session, participants completed 96 trials of pictures (32
same, 64 different) and 96 trials of words (32 same, 64 different). Given that each
trial contained two stimulus presentations, participants studied a total of 128 pictures
and 128 words from different trials. These stimuli were used to test long-term memory
during the test session. During the long-term memory task, participants completed
192 trials of pictures (128 studied, 64 new) and 192 trials of words (128 studied, 64
new).
105
(a) A. Study
(b) B. Test
EEG data EEG data were collected during both study and test sessions using
the Brain Vision ActiChamp 32 channel system and recorded using the Pycorder
software. Electrode positions followed the 10-20 system and included Fz, Cz, Pz, Oz,
Fp1, Fp2, F3, F4, F7, F8, Fc1, Fc2, Fc5, Fc6, Ft9, Ft10, T7, T8, C3, C4, Cp1, Cp2,
Cp5, Cp6, Tp9, Tp10, P3, P4, P7, P8, O1, and O2 (Figure 4.2). During recording,
data were sampled at 1000 Hz and filtered between .01 and 100 Hz. Offline, data
were high-pass filtered with a 0.1 Hz Butterworth filter, downsampled to 256 Hz,
and referenced to the average of the mastoids (TP9 and TP10). Post-stimulus ERPs
with a 1000 ms duration were extracted and were baseline-corrected with respect to a
106
200 ms prestimulus baseline. Visual inspection was then used to remove epochs that
107
4.2.2 Artifacts Removal
Brain signals often contain significant artifacts that lead to major problems in
signal analysis, when the activity due to artifacts has a higher amplitude than the one
due to neural sources. The common sources of artifacts include eye movements, mus-
(ICA) has been successfully applied for artifacts removal in many studies. The basic
idea is to decompose the brain data into independent components, determine the ar-
tifacted components using pattern and source localization analysis, and reconstruct
the brain signals by excluding those artifacted components. However, linking com-
ponents to artifact sources (e.g., eye blinking, muscle movements) remains largely
based algorithm, called ADJUST [42], for signal artifact removal. ADJUST applies
components of artifacts automatically. These artifacts can be removed from the data
without affecting the activity of neural sources [42]. The data analysis in the following
Four groups of feature extraction techniques were employed to capture signal charac-
teristics that may be relevant to assess memory workload. They were signal power,
statistical, morphological, and wavelet features. For a data epoch with n channels,
we first extracted features from signals at each channel, and then concatenated the
features of all the n channels to construct the feature vector of the data epoch. Let
Signal Power Features: Adopting the signal features used in a previous work
[43], we computed signal power for each channel in every nonoverlap 2-Hz intervals
from 4-40 Hz. The 18 power features provide finer signal power spectrum information
than the commonly used brain signal frequency bands, such as theta, alpha, beta,
associated with the power asymmetry of the EEG signals, it is unknown that if power
Statistical Features: Four most widely used statistical measures, mean, vari-
particular, mean is the averaged signal amplitude, and variance measures the signal
variability to the mean. The high order statistics skewness quantifies the extent of
109
(a) Topography of the event (b) Topography of the event (c) Topography of the event
of Eye Blink of Vertical Eye Movement of Horizontal Eye Movement
110
Figure 4.5: Group of Channels for Inter- and Intra-hemispheric power band asymme-
try. For Inter-hemispheric power band asymmetry, the value is calculated by pairs of
same colors over another hemisphere. For Intra-hemispheric power band asymmetry,
the value is calculated by pairs of different colors within the same hemisphere.
111
the distribution leans to one side of the mean, and kurtosis measures the ‘peakedness’
of the distribution.
their usefulness in our previous studies of brain signals [45, 46]. A brief description
• Curve Length: also known as ‘line length’ which was first proposed by Olsen et
al. [47]. Curve length is the sum of distances between successive points, given
by m−1
X
|xi+1 − xi |. (4.1)
i=1
Since curve length increases as the signal magnitude or frequency increases, it
many brain signal studies, such as epileptic seizure detection [48], stimulation
• Average Nonlinear Energy: nonlinear energy was first proposed by Kaiser [50].
It has been found that the nonlinear energy is sensitive to spectral changes.
defined as
t−b
Z
1
C(a, b) = X(t) √ Ψ( )dt (4.4)
R a a
where Ψ is the mother wavelet, C(a, b) are the WT coefficients of the signal X(t), a
is the scale parameter, and b is the shifting parameter. Continuous wavelet transform
and b = k2j for all (j, k) ∈ Z given the decomposition level of j. Since CWT
explores every possible scale a and shifting b, it is generally a lot more computationally
and frequency domain. At each level of decomposition, DWT works as a set of band-
pass filters to divide a signal into two bands called approximations and details signals.
The approximations (A) are the low frequency components of the signal, and the
details (D) are the high-frequency components. Among different wavelet families, we
due to its orthogonality property and efficient filter implementation [54]. A 4-level
DWT decomposition was applied to the collected signals with the sampling rate of
128 Hz. Table 4.1 lists the decomposed signals A4, D4, D3, D2, D1, which roughly
corresponded to the commonly recognized brain signal frequency bands delta, theta,
After the 4-level DWT decomposition, a set of wavelet coefficients were obtained
a popular measure called wavelet entropy (WE), which indicates the degree of multi-
113
frequency signal order/disorder in the signals [55]. To obtain wavelet entropy, the first
step is to calculate relative wavelet energy for each decomposition level as follows
Ej Ej
pj = = Pn , (4.5)
Etot j=1 Ej
where j is the resolution level, n is the number of selected resolution levels for analysis,
calculated by the summation of the squared values of the wavelet coefficients at level
i. The relative wavelet energy pj can be considered as the power density of the
[56] for analyzing and comparing probability distributions, the WE is defined similarly
by
n
X
WE = − pj × ln(pj ), (4.6)
j=1
where pj is the relative wavelet energy of resolution level j. The wavelet entropy
offers a suitable tool for characterizing the order/disorder of signals powers in the
five brain signal frequency bands (delta, theta, alpha, beta and gamma) during the
n-back task. For example, if relative wavelet energy at a resolution level i (e.g., alpha
band) is dominant over others, such as pi equals almost one, and all other relative
wavelet energies are almost zero. In this case, the wavelet entropy will be a very small
value near zero. On the other hand, the relative wavelet energies are almost equal for
all resolution levels, then the WE will reach its maximum value.
4.2.4 Feature Vector Classification Using Proximal Support Vector Machine (PSVM)
levels (0-, 1-, 2-, 3-back). A popular binary classification technique, support vector
machine (SVM), was employed to investigate the data separability at different mental
workload levels. SVM techniques have been successfully applied in many classification
114
Table 4.1: Frequency ranges and the corresponding brain signal frequency bands of
the five levels of signals by discrete wavelet decomposition.
Decomposed Level Frequency Range (Hz) Approximate Band
D1 32-64 Gamma
D2 16-32 Beta
D3 8-16 Alpha
D4 4-8 Theta
A4 0-4 Delta
problems [57, 58, 59, 60, 61]. The fundamental problem of SVM is to build an optimal
decision boundary to separate two categories of data. Let Y denote a n×k dimensional
feature vector for a multi-channel data session at certain difficulty level, where n is
the number of signal channels and k is the number of features of each channel. To
classify data with two workload levels, let l denote the sample class label and l = 1
denotes one workload level, and l = −1 means the other workload level.
Assume we have p sessions of level one denoted by S1 = {(Y1 , l1 ), (Y2 , l2 ), ..., (Yp , lp )},
and q sessions of level two denoted by S2 = {(Yp+1 , lp+1 ), (Yp+2 , lp+2 ), ..., (Yp+q , lp+q )}.
infinitely many hyperplanes in Rn×k to separate the two data groups. Based on
statistics learning theory (STL), a SVM selects a hyperplane which maximizes its
distance from the closest point from the samples. This distance is referred to as mar-
gin. The standard SVM formulation that maximizes the margin and minimizes the
Pp+q
minω,ξ,b { 12 kωk2 + C i=1 ξi : D(Y T ω + be) ≥ e − ξi }, (4.7)
where ω is the weight vector, and the slack variables ξ is introduced to measure the
degree of misclassification during training. The penalty cost C is used to control the
tradeoff between a large margin and a small prediction error penalty. Each column
to one. The first term of the objective function in 4.7 is due to maximize the margin
of separation 2/kwk, and the second term measures how much emphasis is given to
Since the standard SVM classifiers usually require a large amount of computa-
tion time for training, the Proximal SVM (PSVM) algorithm was introduced Man-
gasarian and Wild [62] as a fast alternative to the standard SVM formulation. The
‘proximal’ planes, around which the points of each class are clustered and which are
pushed as far apart as possible by the term (kωk2 +b2 ) in the above objective function.
It has been shown that PSVM has comparable classification performance to that of
standard SVM classifiers, but can be an order of magnitude faster [62]. Therefore,
procedure which consists of training and testing phases. During the training phase, a
classifier is trained to achieve the optimal separation for the training data set. Then
in the testing phase, the trained classifier is used to classify new samples with un-
evaluation when the sample size is small. It is capable of providing almost unbiased
estimate of the generalization ability of a classifier [63]. For the 29 subjects, the total
116
number of data samples (trials) for session A and B are 128 and 386 respectively. We
designed a 2-fold cross-validation method to train and evaluate the SVM classifier.
der various events, we separate the data into 5 and 3 epochs for session A and B
respectively, as shown in Figure 4.1. Based on the event markers of the EEG data,
we further define 23 conditions for each session. The following table lists all of the
conditions.
Table 4.2: A list of all comparison conditions of the experiments. For comparison con-
ditions 4 to 11, the naming structure was stimuli/grand truth/response. For conditions
12 to 23, the naming structure was stimuli/response to that stimuli in test session/if
it was the 1st or 2nd stimuli. For conditions 35 to 46, it was stimuli/confidence level
of having seen the stimuli/correctness
Group A Group B
condition event condition event
1 all samples 24 all samples
2 picture 25 picture
3 word 26 word
4 picture - same - same 27 picture - long term Low
5 picture - same - diff 28 picture - long term High
6 picture - diff - diff 29 picture - long term New
7 picture - diff - same 30 word - long term Low
8 word - same - same 31 word - long term High
9 word - same - diff 32 word - long term New
10 word - diff - diff 33 picture - correct
11 word - diff - same 34 word - correct
12 picture - long term Low - stim1 35 picture - low confidence - correct
13 picture - long term High - stim1 36 picture - high confidence -correct
14 picture - long term New - stim1 37 picture - new - correct
15 word - long term Low - stim1 38 picture - low confidence - wrong
16 word - long term High - stim1 39 picture - high confidence -wrong
17 word - long term New - stim1 40 picture - new - wrong
18 picture - long term Low - stim2 41 word - low confidence - correct
19 picture - long term High - stim2 42 word - high confidence -correct
20 picture - long term New - stim2 43 word - new - correct
21 word - long term Low - stim2 44 word - low confidence - wrong
22 word - long term High - stim2 45 word - high confidence -wrong
23 word - long term New - stim2 46 word - new - wrong
117
For each comparison group, we divided the corresponding data samples into 5
non-overlapping subsets. Each time we picked one subset out and trained the PSVM
classifier by the data samples of another set. The samples of the left-out subset were
Repeating this procedure again for another set, the averaged prediction accuracy
over the 5-fold runs was used to indicate the degree of separability of the EEG signals
dimensional space.
The basic idea of mRMR is to select the most relevant features with respect to
class labels while minimizing redundancy amongst the selected features. The mRMR
For two features X and Y, p(X) and p(Y) are marginal probability functions,
and p(X, Y ) is the connected probability distribution while I(X, Y ) is the amount of
The mRMR method aims to minimize redundancy (Rd) while maximizing rel-
evance (Re) amongst the features. The Rd and Re by the following definition:
1 X
Rd = I(i, j) (4.10)
|S|2 i,j∈S
1 X
Re = I(h, j) (4.11)
|S| i∈S
118
Where S is the set of features, h are the target class labels, and I(i, j) is the mu-
tual information between features i and j. The feature selection criterion combining
the above two constraints is the mRMR, for which the objective function of feature
φ(D, R) = D − R. (4.12)
An optimal subset of features are the ones that maximize the above mRMR objective
function.
Table 4.3 shows the classification accuracy for 46 conditions and 8 epochs with
5-fold cross validation and 10 features selected by mRMR and without any ICA
artifacts removal. The classification accuracies mostly ranges from 60% to 85%. Some
epoch B2. The highest accuracy which is 94.59% occurs in condition 30 and epoch
B1.
Table 4.4 shows the classification accuracy for 46 conditions and 8 epochs with
5-fold cross validation and 10 features selected by mRMR and with ICA artifacts
removal. The classification accuracies mostly ranges from 70% to 85%. Some con-
ditions can go as high as 90%, such as condition 2 epoch A4, condition 14 epoch 3,
condition 15 epoch A4 and more. The highest accuracy which is 97.30% occurs in
condition 20 and epoch A4. It is generally better than directly using the raw data
To sum up, from the 2 classification settings we have tried, we found that 5-fold
cross validation and 10 features selected by mRMR with ICA artifacts removal gives
better results. And, Epoch A4 generally gives better classification result. Looking at
119
Table 4.3: The table of the classification accuracy for 46 conditions and 8 epochs
with 5-fold cross validation and 10 features selected by mRMR and without any ICA
artifacts removal.
Epoch Epoch
condition A1 A2 A3 A4 A5 condition B1 B2 B3
1 51.35 78.38 83.78 81.08 70.27 24 59.46 78.38 64.86
2 51.35 70.27 64.86 72.97 67.57 25 86.49 81.08 54.05
3 81.08 70.27 67.57 86.49 83.78 26 67.57 72.97 72.97
4 - - - - - 27 72.97 91.89 70.27
5 - - - - - 28 64.86 81.08 81.08
6 64.86 78.38 62.16 75.68 72.97 29 70.27 89.19 59.46
7 80.00 68.57 88.57 74.29 77.14 30 94.59 86.49 89.19
8 - - - - - 31 67.57 83.78 72.97
9 - - - - - 32 70.27 81.08 72.97
10 75.00 80.56 77.78 86.11 77.78 33 72.97 81.08 56.76
11 - - - - - 34 59.46 78.38 70.27
12 67.57 64.86 78.38 78.38 70.27 35 78.38 67.57 62.16
13 59.46 67.57 75.68 78.38 75.68 36 72.97 83.78 75.68
14 62.16 78.38 70.27 75.68 89.19 37 75.68 86.49 51.35
15 56.76 78.38 75.68 81.08 78.38 38 62.16 70.27 56.76
16 81.08 81.08 83.78 86.49 78.38 39 74.29 57.14 65.71
17 72.97 64.86 75.68 72.97 67.57 40 78.38 81.08 81.08
18 59.46 81.08 62.16 81.08 72.97 41 72.97 72.97 72.97
19 62.16 64.86 72.97 72.97 78.38 42 72.97 75.68 72.97
20 83.78 59.46 51.35 67.57 75.68 43 81.08 81.08 62.16
21 86.49 81.08 91.89 70.27 81.08 44 75.00 63.89 63.89
22 64.86 86.49 64.86 89.19 51.35 45 62.16 70.27 70.27
23 56.76 70.27 78.38 75.68 81.08 46 83.78 72.97 72.97
the selected features of the highest accuracy setting, we may find the major difference
For epoch 4 and condition 20, the classifier selected F1, F8, F14 and F18 ex-
tensively. They are mean, variance, skewness, kurtosis, relative band power, wavelet
Figure 4.6 shows the comparison for the EEG signals of 30 channels of musicians
and non-musicians for epoch B1 under condition 30. In this case, the PSVM classi-
fier reaches 97.30% classification accuracy. From the plots, we can also observe the
120
Table 4.4: The table of the classification accuracy for 46 conditions and 8 epochs with
5-fold cross validation and 10 features selected by mRMR and with ICA artifacts
removal
Epoch Epoch
condition A1 A2 A3 A4 A5 condition B1 B2 B3
1 86.49 64.86 72.97 81.08 62.16 24 56.76 67.57 62.16
2 78.38 78.38 75.68 91.89 67.57 25 48.65 78.38 67.57
3 72.97 75.68 78.38 70.27 62.16 26 86.49 64.86 62.16
4 - - - - - 27 59.46 83.78 78.38
5 - - - - - 28 78.38 78.38 70.27
6 83.78 81.08 86.49 81.08 72.97 29 81.08 72.97 56.76
7 71.43 68.57 65.71 82.86 77.14 30 97.30 75.68 64.86
8 - - - - - 31 59.46 81.08 64.86
9 - - - - - 32 81.08 72.97 81.08
10 75.00 69.44 72.22 66.67 50.00 33 72.97 78.38 56.76
11 - - - - - 34 54.05 89.19 81.08
12 75.68 78.38 62.16 72.97 75.68 35 78.38 86.49 72.97
13 64.86 64.86 78.38 89.19 70.27 36 83.78 67.57 64.86
14 81.08 72.97 91.89 62.16 83.78 37 75.68 78.38 70.27
15 89.19 67.57 59.46 91.89 81.08 38 86.49 91.89 54.05
16 78.38 78.38 83.78 70.27 70.27 39 71.43 71.43 68.57
17 81.08 75.68 67.57 72.97 75.68 40 78.38 86.49 62.16
18 75.68 72.97 75.68 78.38 81.08 41 75.68 83.78 70.27
19 75.68 78.38 83.78 83.78 56.76 42 78.38 59.46 54.05
20 72.97 72.97 67.57 97.30 67.57 43 86.49 78.38 75.68
21 78.38 67.57 78.38 75.68 70.27 44 72.22 91.67 72.22
22 62.16 67.57 78.38 81.08 72.97 45 78.38 83.78 67.57
23 81.08 70.27 81.08 59.46 81.08 46 62.16 67.57 67.57
significant differences between 2 groups of people. Figure 4.7 shows that musicians
In conclusion, the method satisfactorily predict the class of subjects. The high-
est successful rate is 97.30% which occurs in condition 30 and Epoch B1.
Because different events may have different responses, so based on the event
markers, the sessions are separated into several small parts for detailed analysis.
121
fp1 fp2
f7 f8
f3 f4
fz
ft9 ft10
fc5 fc6
fc1 fc2
c3 cz c4 t8
t7
cp1 cp2
cp5 cp6
pz
p3 p4
p7 p8
o1 oz o2
−3.78
+3.78
0 246
Time (ms)
Figure 4.6: Comparison for the EEG signals of 30 channels of musicians (blue line)
and non-musicians (red line) at epoch B1 and condition 30.
122
Latency 100 ms from time=100ms
2.3
1.2
−1.2
−2.3
Figure 4.7: Head plot for musicians and non-musicians at epoch B1 at 100sec with
ICA-Based Artifact Removal
123
Table 4.5: The table of the classification sensitivity and specificity for 46 conditions
and 8 epochs with 5-fold cross validation and 10 features selected by mRMR and with
ICA artifacts removal
Epoch Epoch
A1 A2 A3 A4 A5 B1 B2 B3
cond. sen spec sen spec sen spec sen spec sen spec cond. sen spec sen spec sen spec
1 0.63 0.39 0.63 0.94 0.89 0.78 0.95 0.67 0.84 0.56 24 0.74 0.44 0.84 0.72 0.68 0.61
2 0.58 0.44 0.58 0.83 0.68 0.61 0.79 0.67 0.74 0.61 25 0.79 0.94 0.84 0.78 0.58 0.50
3 0.89 0.72 0.79 0.61 0.63 0.72 0.84 0.89 0.89 0.78 26 0.68 0.67 0.74 0.72 0.79 0.67
4 - - - - - - - - - - 27 0.84 0.61 0.95 0.89 0.84 0.56
5 - - - - - - - - - - 28 0.84 0.44 0.79 0.83 0.79 0.83
6 0.68 0.61 0.74 0.83 0.53 0.72 0.68 0.83 0.63 0.83 29 0.68 0.72 0.95 0.83 0.58 0.61
7 0.89 0.69 0.84 0.50 1.00 0.75 0.74 0.75 0.84 0.69 30 0.89 1.00 0.89 0.83 0.89 0.89
8 - - - - - - - - - - 31 0.63 0.72 0.89 0.78 0.74 0.72
9 - - - - - - - - - - 32 0.79 0.61 0.89 0.72 0.74 0.72
10 0.79 0.71 0.84 0.76 0.84 0.71 0.84 0.88 0.84 0.71 33 0.79 0.67 0.95 0.67 0.32 0.83
11 - - - - - - - - - - 34 0.58 0.61 0.89 0.67 0.74 0.67
12 0.68 0.67 0.63 0.67 0.79 0.78 0.79 0.78 0.79 0.61 35 0.89 0.67 0.79 0.56 0.58 0.67
13 0.47 0.72 0.68 0.67 0.84 0.67 0.79 0.78 0.84 0.67 36 0.68 0.78 0.95 0.72 0.63 0.89
14 0.58 0.67 0.95 0.61 0.74 0.67 0.74 0.78 0.84 0.94 37 0.79 0.72 0.89 0.83 0.53 0.50
15 0.58 0.56 0.74 0.83 0.74 0.78 0.74 0.89 0.74 0.83 38 0.74 0.50 0.58 0.83 0.63 0.50
16 0.84 0.78 0.84 0.78 0.84 0.83 0.79 0.94 0.84 0.72 39 0.58 0.94 0.53 0.63 0.74 0.56
17 0.74 0.72 0.74 0.56 0.79 0.72 0.68 0.78 0.74 0.61 40 0.89 0.67 0.79 0.83 0.89 0.72
18 0.63 0.56 0.79 0.83 0.63 0.61 0.68 0.94 0.68 0.78 41 0.68 0.78 0.74 0.72 0.79 0.67
19 0.74 0.50 0.74 0.56 0.79 0.67 0.74 0.72 0.79 0.78 42 0.84 0.61 0.84 0.67 0.74 0.72
20 0.95 0.72 0.53 0.67 0.58 0.44 0.79 0.56 0.79 0.72 43 0.84 0.78 0.89 0.72 0.63 0.61
21 0.84 0.89 0.95 0.67 1.00 0.83 0.68 0.72 0.84 0.78 44 0.78 0.72 0.61 0.67 0.61 0.67
22 0.63 0.67 0.89 0.83 0.84 0.44 0.89 0.89 0.53 0.50 45 0.79 0.44 0.79 0.61 0.79 0.61
23 0.53 0.61 0.68 0.72 0.89 0.67 0.74 0.78 0.84 0.78 46 0.95 0.72 0.79 0.67 0.79 0.67
There are only two classes - musicians and non-musicians - in this prediction process.
Univariate features are extracted from the 30 channels of EEG signals. We have
In the future, we will consider some outlier removal techniques on epochs. Bad
data does exist in every EEG data. There are many possible causes, such as muscle
124
Table 4.6: The table of the classification sensitivity and specificity for 46 conditions
and 8 epochs with 5-fold cross validation and 10 features selected by mRMR and with
ICA artifacts removal
Epoch Epoch
A1 A2 A3 A4 A5 B1 B2 B3
cond. sen spec sen spec sen spec sen spec sen spec cond. sen spec sen spec sen spec
1 0.84 0.89 0.63 0.67 0.84 0.61 0.84 0.78 0.53 0.72 24 0.53 0.61 0.68 0.67 0.74 0.50
2 0.74 0.83 0.79 0.78 0.89 0.61 0.95 0.89 0.84 0.50 25 0.58 0.39 0.84 0.72 0.53 0.83
3 0.79 0.67 0.79 0.72 0.79 0.78 0.74 0.67 0.58 0.67 26 0.84 0.89 0.84 0.44 0.74 0.50
4 - - - - - - - - - - 27 0.68 0.50 0.89 0.78 0.84 0.72
5 - - - - - - - - - - 28 0.74 0.83 0.84 0.72 0.79 0.61
6 0.74 0.94 0.84 0.78 0.89 0.83 0.89 0.72 0.68 0.78 29 0.79 0.83 0.84 0.61 0.58 0.56
7 0.68 0.75 0.79 0.56 0.74 0.56 0.84 0.81 0.84 0.69 30 0.95 1.00 0.89 0.61 0.63 0.67
8 - - - - - - - - - - 31 0.84 0.33 1.00 0.61 0.84 0.44
9 - - - - - - - - - - 32 0.84 0.78 0.79 0.67 0.79 0.83
10 0.89 0.59 0.74 0.65 0.84 0.59 0.74 0.59 0.42 0.59 33 0.84 0.61 0.89 0.67 0.79 0.33
11 - - - - - - - - - - 34 0.53 0.80 0.79 1.00 0.79 0.83
12 0.79 0.72 0.95 0.61 0.58 0.67 0.89 0.56 0.79 0.72 35 0.84 0.72 0.84 0.89 0.74 0.72
13 0.63 0.67 0.63 0.67 0.74 0.83 0.95 0.83 0.74 0.67 36 0.89 0.78 0.74 0.61 0.58 0.72
14 0.95 0.67 0.68 0.78 0.95 0.89 0.63 0.61 0.74 0.94 37 0.68 0.83 0.79 0.78 0.74 0.67
15 0.95 0.83 0.68 0.67 0.42 0.78 0.95 0.89 0.89 0.72 38 1.00 0.72 0.95 0.89 0.58 0.50
16 0.84 0.72 0.89 0.67 0.89 0.78 0.84 0.56 0.74 0.67 39 0.63 0.81 0.74 0.69 0.68 0.69
17 0.79 0.83 0.84 0.67 0.89 0.44 0.79 0.67 0.68 0.83 40 0.95 0.61 0.95 0.78 0.68 0.56
18 0.68 0.83 0.74 0.72 0.79 0.72 0.74 0.83 0.79 0.83 41 0.79 0.72 0.84 0.83 0.63 0.78
19 0.63 0.89 0.74 0.83 0.84 0.83 0.89 0.78 0.68 0.44 42 0.84 0.72 0.68 0.50 0.58 0.50
20 0.74 0.72 0.74 0.72 0.68 0.67 0.95 1.00 0.63 0.72 43 0.84 0.89 0.84 0.72 0.63 0.89
21 0.74 0.83 0.74 0.61 0.84 0.72 0.68 0.83 0.58 0.83 44 0.72 0.72 1.00 0.83 0.67 0.78
22 0.84 0.39 0.74 0.61 0.79 0.78 0.84 0.78 0.74 0.72 45 0.74 0.83 0.84 0.83 0.89 0.44
23 0.84 0.78 0.68 0.72 0.84 0.78 0.74 0.44 0.74 0.89 46 0.63 0.61 0.68 0.67 0.84 0.50
125
CHAPTER 5
This dissertation focuses on the methodologies for addressing the two problems
industries. The problems will involve both stationary and nonstationary time series.
model on stationary time series prediction problem for healthcare and railroad in-
dustries. ARIMA and DLM represent two different ways to explain and model time
series.
Dynamic Linear Model (DLM) which is a special type of State Space methods,
has been developed as alternative tools for time series forecasting . However, to ap-
ply DLM, the signal-to-noise-ratio R has to be specified. Since the true value of R
is generally not available, the only way is to guess a value which is inconvenient and
cally in the forecasting procedure. The properties of the proposed R estimator and
the new forecasting procedure with this estimator are studied by simulations.
periodic time series prediction framework and applied it on respiratory motion time
rate irradiation during treatment. Accurate respiratory motion prediction can mini-
mize the damage of normal body tissues and important human organs.
Pattern matching can effectively utilize the existing information from the data.
Similar pattern demonstrates similar trends in the response. The pattern recognition
126
process is enhanced by combining with statistical and feature analysis which help
to obtain better matched patterns and remove undesired patterns. The experimental
results show that the prediction of the proposed method is very accurate and it is also
robust to different kind of patients. We have compared the proposed novel pattern-
based method to the current state-of-the-art methods and found that the proposed
method outperforms all other methods. It should greatly contribute to tumor position
prediction in order to help the cancer patients to enhance their life quality.
The EEG signals are first cleaned by ICA-based artifacts removal and outlier epoch
rejection. Then, the features of the EEG signals are extracted by using extensive
efficient. The performance of PSVM is usually satisfactory. So, we use PSVM as the
To sum up, the method satisfactorily predict the class of subjects. The highest
retain the useful information. Many work have been done on this problem but not
many give satisfactory result. In our study, we apply ICA to decompose the signal
into ICs and then remove those are considered as components of artifacts before
reconstructing the signal. From the result in Chapter 4, we are able to show that our
data does exist in every EEG data. There are many possible causes, such as muscle
127
movement or distraction of participant during the experiment. The performance is
128
REFERENCES
[1] J. G. DeGooijer and R. J. Hyndman, “25 years of time series forecasting,” In-
“Mining time series data,” Data Mining and Knowledge Discovery Handbook, pp.
1049–1077, 2010.
[3] G. Rubio, H. Pomares, I. Rojas, and L. J. Herrera, “A heuristic method for pa-
[4] J. d. Preez and S. F. Witt, “Univariate versus multivariate time series forecast-
report of aapm task group 76a),” Medical physics, vol. 33, no. 10, pp. 3874–3900,
2006.
[7] S. Wang, “Construct an optimal triage prediction model: A case study of the
129
[8] Y. Chang and M. Liao, “A seasonal arima model of tourism forecasting: The
case of taiwan,” asia pacific journal of tourism research,” Asia Pacific Journal
[11] E. Walter, “Models with trend,” Applied Econometric Time Series (Second ed.).
[13] P. Yelland, “Bayesian forecasting for low-count time series using state-
new motion management method for lung tumor tracking radiation therapy,”
[16] F. Ernst and A. Schweikard, “Forecasting respiratory motion with accurate online
130
[17] K. Ichiji, N. Homma, M. Sakai, Y. Narita, Y. Takai, and X. Zhang, “A time-
medicine, 2013.
[18] S. Wang, “Online monitoring and prediction of complex time series events from
nonstationary time series data,” Ph.D. dissertation, The State University of New
Jersey, 2012.
[19] B. Abraham and J. Ledolter, “Statistical methods for forecasting,” Hoboken, NJ:
[20] G. E. P. Box and G. M. Jenkins, “Time series analysis: forecasting and control,”
Francisco Holden-Day.
[22] M. West and J. Harrison, “Bayesian forecasting and dynamic models,” New York,
[23] X. Fei, Y. Zhange, K. Liu, and M. Guo, “Bayesian dynamic linear model with
switching for real-time short-term freeway travel time prediction with license
plate recognition data.” Journal of Transportation Engineering, vol. 139, no. 11,
p. 1058, 2013.
[25] D. Ruan, “Image guided respiratory motion analysis: time series and image
Springer, 2011.
131
[27] K. Ichiji, N. Homma, M. Sakai, M. Abe, N. Sugita, and M. Yoshizawa, A Respi-
75–90.
[28] A. Krauss, A. Nill, and U. Oelfke, “The comparative performance of four res-
motion,” Physics in medicine and biology, vol. 55, no. 5, p. 1311, 2010.
[31] Y. Chen, B. Yang, and J. Dong, “Time-series prediction using a local linear
449–465, 2006.
Journal of Imaging Systems and Technology, vol. 24, no. 1, pp. 8–15, 2014.
vitamin d status using support vector regression,” PloS one, vol. 8, no. 11, p.
e79970, 2013.
filters and support vector regression,” Physics in medicine and biology, vol. 54,
[36] Y. Bao, T. Xiong, and Z. Hu, “Multi-step-ahead time series prediction using
493, 2014.
Society (EMBC), 2012 Annual International Conference of the IEEE, vol. 8, pp.
6028–6031, 2012.
ingfortimeseriesclassification,” 2011.
[41] R. Croft and R. Barry, “Removal of ocular artifact from the eeg: A review.”
eeg artifact detector based on the joint use of spatial and temporal features.”
[43] D. Grimes, D. Tan, S. Hudson, P. Shenoy, , and R. Rao, “Feasibility and prag-
133
Proceedings of the SIGCHI Conference on Human Factors in Computing Sys-
ing,” Behavioral and Brain Functions, vol. 10, no. 1, p. 12, 2014.
Man, and Cybernetics, Part A: Systems and Humans, vol. 41, no. 6, pp. 1199–
1212, 2011.
[46] S. Wong, G. Baltuch, J. Jaggi, and S. Danish, “Functional localization and vi-
during dbs surgery with unsupervised machine learning,” Journal of Neural En-
[48] R. Esteller, J. Echauz, T. Cheng, B. Litt, and B. Pless, “An efficient feature
[49] R. Esteller, J. Echauz, and T. Tcheng, “Comparison of line length feature before
[50] J. Kaiser, “On a simple algorithm to calculate the energy of a signal,” Proceedings
ory and applications in science, engineering, medicine, and finance,” Taylor and
Francis, 2002.
[54] A. Subasi, “Eeg signal classification using wavelet feature extraction and a mix-
ture of expert model,” Expert Systems with Applications, vol. 32, no. 4, pp.
1084–1093, 2007.
E. Basar, “Wavelet entropy: a new tool for analysis of short duration brain
electrical signals,” Journal of Neuroscience Methods, vol. 105, no. 1, pp. 65–75,
2001.
[57] B. Blankertz, G. Curio, and K. Muller, “Classifying single trial eeg: towards
systems, volume 17, chapter methods towards invasive human brain computer
135
[59] A. Rakotomamonjy, V. Guigue, G. Mallet, and V. Alvarado, “Ensemble of svms
competition 2003-data set iib: support vector machines for the p300 speller
1073–1076, 2004.
nonlinear, and feature selection methods for eeg signal classification,” IEEE
[64] H. Peng, F. Long, and C. Ding, “Feature selection based on mutual informa-
Analysis and Machine Intelligence, IEEE Transactions on, vol. 27, no. 8, pp.
1226–1238, 2005.
136
BIOGRAPHICAL STATEMENT
Jerry K.M. Kam joined the Department of Industrial & Manufacturing System
Engineering at UTA in the Fall of 2010. He received his B.S. degree in Industrial
co-advised by Prof. Li Zeng and Prof. Shouyi Wang on his PhD study. Currently,
he is working with Prof. Wang on research problems in the field of time series data
mining including respiratory motion time series prediction, time series segmentation
137