You are on page 1of 10

Clusterwise Linear Regression

By
Eman Ismail
Introduction
 Multiple regression has been frequently used in many
fields.

 Problem
DeSarbo and Cron (1988) mentioned that there are many
applications where the estimation of a single regression line
is not adequate or is even misleading.
Introduction
➢ They showed that by the following synthetic data set ,
as shown in the following figure.

Heterogenous Data

Figure (1): An example of inadequacy of a single regression line


Introduction
• If one regression model was
estimated, the model would be:

yi = 0 + 0 xi , i = 1,...,14 where R2 = 0. (1)

• If the observations were firstly Figure 2


clustered, and two separate
regression models were estimated,
then the models would be:

Cluster 1 : yi = 1 + 2 xi , i = 1,...,7 (2)

Cluster 2 : yi = −1 − 2 xi , i = 8,...,14, (3)

with a combined R 2 = 1.
Figure 3
Introduction
Solution

The Clusterwise Linear


The plot of the data set Regression (CLR) model
should be studied first to check which simultaneously divides the
for such structure prior to observations into a number of
estimation. homogenous clusters and
estimates a regression model for
each cluster by optimizing one
single objective function.
Introduction
 The CLR allows the regression coefficient to vary with observations of
different clusters.
 The importance of the clusterwise regression (CR) can be summarized
as follows:
 Lau et al. (1999) stated that the analysis of real dataset always involves
simultaneous applications of several related statistical methods. The CR
is a technique that applies two statistical methods at the same time,
cluster analysis and regression analysis, by optimizing one single
objective function.
 CR is a better approach than applying the two stage method, which
applies cluster analysis (stage 1) then applies the regression analysis on
the resulting clusters (stage 2) because DeSarbo and Edwards (1996)
mentioned that cluster analysis does not form groups on the basis of the
interrelations between X and y.
 CR represents the true structure present in the data and improves the
overall goodness of fit as clarified in DeSarbo and Cron (1988) and Shao
(2004).
Model 1: NLP model
➢ Lau et al. (1999) formulated the CLR using the MP approach

Find the values of C i 1 ,C i 2 ,  0 ,  0 ,  j ,  j , i 1 and  i 2 i = 1,..., n , j = 1,..., J which:


n
minimize  (C i 1 ( i21 ) + C i 2 ( i22 )) (4)
i =1
J
subject to y i =  0 +   j x ij +  i 1 , i = 1,..., n , (5)
j =1
J
y i =  0 +   j x ij +  i 2 , i = 1,..., n , (6)
j =1

C i 1 + C i 2 = 1, i = 1,..., n , (7)
C i 1 ,C i 2  0,  i 1 ,  i 2 ,  0 ,  j ,  0 ,  j unrestricted, i = 1,..., n , j = 1,..., J . (8)
Model 1: NLP model
 This model is defined by; 𝑛 observations, 𝐽 explanatory variables 𝑥𝑖𝑗 and one
response variable 𝑦𝑖 .
 The iterators are for; an explanatory variable 𝑗 ∈ 1, … , 𝐽 and an observation(𝑖
∈ 1, … , 𝑛 ).
 Also, this model assumes that a sample of n observations is divided into two
mutually exclusive segments or clusters (I, II).
 The model decision variables are: the regression coefficients of cluster I and cluster
II 𝛼0 , 𝛼𝑗 and 𝛽0 , 𝛽𝑗 , respectively; the deviations of observation 𝑖 from the regression
line of cluster I and cluster II 𝜀𝑖1 and 𝜀𝑖2 , respectively and binary decision variables
which indicate whether observation 𝑖 belongs to cluster I or cluster II 𝐶𝑖1 and 𝐶𝑖2 ,
respectively.
 If observation 𝑖 belongs to cluster I then 𝐶𝑖1 = 1, otherwise 𝐶𝑖1 = 0. While, if
observation 𝑖 belongs to cluster II then 𝐶𝑖2 = 1, otherwise 𝐶𝑖2 = 0.
Model 1: NLP model
 The objective function (4) is to minimize the total sum of squared
errors.
 Constraint (5) and (6) are required for defining and estimating the
regression functions of cluster I and cluster II, respectively.
 Constraint (7) ensures that the observation i is a member of either
cluster I or cluster II but not both.
 We do not restrict 𝐶𝑖1 and 𝐶𝑖2 to be binary. The optimization with
constraint (7) and 𝐶𝑖1 , 𝐶𝑖2 ≥ 0 will force them to be either zero or
one.
Model 2: NLP model
 If the distribution of the error term is not known, most researchers appeal to
the robustness by minimizing the sum of absolute errors, instead of the sum of
squared errors in equation (4).

Find the values of C i 1 ,C i 2 ,  0 ,  0 ,  j , j , i+1 ,  i−1 ,  i+2 and  i−2 i = 1,..., n , j = 1,..., J which:
n
minimize  (C i 1 ( i+1 +  i−1 ) + C i 2 ( i+2 +  i−2 )) (9)
i =1

subject to
J
y i =  0 +   j x ij +  i+1 −  i−1 , i = 1,..., n , (10)
j =1
J
y i =  0 +   j x ij +  i+2 −  i−2 , i = 1,..., n , (11)
j =1

C i 1 + C i 2 = 1, i = 1,..., n , (12)
C i 1 ,C i 2 ,  i+1 ,  i−1 ,  i+2 ,  i−2  0,  0 ,  0 ,  j ,  j unrestricted, i = 1,..., n , j = 1,..., J . (13)

You might also like