A Data-Driven Model For Software Reliability Prediction

A Data-Driven Model for
Software Reliability Prediction

Author: Jung-Hua Lo
IEEE International Conference on Granular Computing (2012)
Young Taek Kim
KAIST SE Lab.
9/4/2013
Contents
Introduction
Background
Overall Approach
Detailed Process
Experimental Results
Conclusion
Discussion
2 / 31
Introduction Background Overall Approach Detailed Process Experimental Results Conclusion Discussion
SW Reliability Prediction
Definition of SW Reliability
 Probability of failure-free operation of a software
product in a specified environment for a specified
time.
SRM (Software Reliability Model)
 To estimate how reliable the software is now.
 To predict the reliability in the future.
Two categories of SRMs
 Analytical Models: NHPP SRMs
 Data-Driven Models: ARIMA, SVM
3 / 31
Data Driven Model
Limitations of Analytical Models

• Software behavior changes during testing phase
 Assumption of “all faults are independent & equally detectable”
is violated by the dataset.
Data Driven Models

• Much less unpractical assumptions:
developed from collected failure data.
• Easy to make abstractions and generalizations of the SW failure
process:
the approach of regression or time series analysis.
4 / 31
Motivation
Problems
 Actual SW failure data set is rarely pure linear or
nonlinear
 No general model suitable for all situations
Proposed Solution
 Hybrid strategy with both linear and nonlinear
predicting model
• ARIMA model: Good performance in predicting linear data
• SVM model: Successful application to nonlinear data
5 / 31
Stationarity
Statistical properties (mean, variance,
covariance, etc.) are all constant over time.
(1) E ( yt )  u y for all t.
(2) Var ( yt )  E[( yt  u y ) 2 ]   y2 for all t.
(3) Cov( yt , yt  k )   k for all t.
60 μ1, σ12, γ1 ≠ μ2, σ22, γ2 60
50 50
40 40
30 Differencing
30 μ1, σ12, γ1= μ2, σ22, γ2
20 20
10 10
0 0
1 2 3 4 5 6 7 8 9 10 11 1 2 3 4 5 6 7 8 9 10 11
6 / 31
ACF (Autocorrelation Function)

The correlation between observations at
different distances apart (lag)
n n
(y t  y )( yt  k  y ) y t
rk  t  k 1 where y t 1
n n
(y
t 1
t  y) 2
7 / 31
PACF (Partial ACF)

 The degree of association between yt and yt-k,
when the effects of other time lags 1, 2, 3, …,
k-1 are removed.
r1 if k  1, rkj  rk 1, j  rkk rk 1,k  j
where
 k 1
 rk   rk 1, j  rk  j

for j = 1, 2, … , k-1.
rkk   j 1
 k 1
if k  2,3, 
 1   rk 1, j  rk

 j 1
PACF
8 / 31
Removing Non-stationarity
Differencing
 Differenced series: yt  yt  yt 1
PACF
9 / 31
3 Prediction Models for Stationary Data
AR • Use past values in forecast

(Auto Regressive) • AR(p)
Model 𝑦𝑡 = α1 𝑦𝑡−1 + α2 𝑦𝑡−2 + ⋯ +α𝑝 𝑦𝑡−𝑝 + 𝜀𝑡
• Use past residuals (random events) in

MA forecast
(Moving Average)
• MA(q)
Model
𝑦𝑡 = 𝜀𝑡 + 𝛽1 𝜀𝑡−1 + ⋯ + 𝛽𝑞 𝜀𝑡−𝑞
ARMA • Combination of AR & MA

(Auto Regressive &
Moving Average) • ARMA(p, q)
Model 𝑦𝑡 = α1 𝑦𝑡−1 + α2 𝑦𝑡−2 + ⋯ +α𝑝 𝑦𝑡−𝑝 + 𝜀𝑡
+𝛽1 𝜀𝑡−1 + ⋯ + 𝛽𝑞 𝜀𝑡−𝑞
10 / 31
AR (Auto Regressive) Model (1/2)

AR(p)
 𝑦𝑡 = α1 𝑦𝑡−1 + α2 𝑦𝑡−2 + ⋯ +α𝑝 𝑦𝑡−𝑝 + 𝜀𝑡
α𝑖 : 𝐴𝑢𝑡𝑜𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡
𝜀𝑡 : 𝑒𝑟𝑟𝑜𝑟 𝑎𝑡 𝑡 1.0
Autocorrelation Function for AR1 data series
(with 5% significance limits for the autocorrelations)
Selection of a model
0.8
0.6
0.4
Autocorrelation
0.2
 ACF decreasing exponentially

0.0
Exponentially
-0.2
-0.4
Decreasing
-0.6
• Directly: 0<a<1
-0.8
-1.0
1
(oscillating)
5 10 15 20 25
Lag
30 35 40 45 50
• Oscillating patter: -1<a<0 Partial Autocorrelation Function for AR1 data series
 PACF identifying the order

(with 5% significance limits for the partial autocorrelations)
1.0
0.8
0.6
Partial Autocorrelation
of AR model PACF
0.4
0.2
0.0
Cut off
-0.2
-0.4
at Lag 1  AR(1)
-0.6
-0.8
-1.0
2 4 6 8 10 12 14 16 18 20
Lag
11 / 31
MA (Moving Average) Model (1/2)

MA(q)
 𝑦𝑡 = 𝜀𝑡 + 𝛽1 𝜀𝑡−1 + ⋯ + 𝛽𝑞 𝜀𝑡−𝑞
𝛽𝑖 : 𝑀𝐴 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟
𝜀𝑡 : 𝑒𝑟𝑟𝑜𝑟 𝑎𝑡 𝑡
Example Year Sales(B$) MA(3) 1000 + 1500 + 1250
2000 1000 3
2001 1500
2002 1250 MA(3)
2003 900 1250
2004 1600 1217 1800
2005 950 1250
2006 1650 1150 1300
2007 1750 1400
2008 1200 1450
800
2009 2000 1533
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
2011
2010 2100 1650
Sales(B$) MA(3)
2011 1767
12 / 31
MA (Moving Average) Model (2/2)

Selection of a model
 ACF identifying the
Autocorrelation Function for MA1 data series
(with 5% significance limits for the autocorrelations)
Cut off
1.0
order of MA model
0.8
at Lag 1  MA(1)
0.6
0.4
Autocorrelation
0.2
 PACF decreasing
0.0
-0.2
-0.4
-0.6
exponentially
-0.8
-1.0
1 5 10 15 20 25 30 35 40 45 50
• Directly: 0<a<1
Lag
• Oscillating patter: -1<a<0 Partial Autocorrelation Function for MA1 data series
(with 5% significance limits for the partial autocorrelations)
1.0
Exponentially
0.8
0.6
Partial Autocorrelation
PACF
0.4
0.2 Decreasing (oscillating)
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2 4 6 8 10 12 14 16 18 20
Lag
13 / 31
ARMA Model
ARMA(p,q) = AR(p) + MA(q)
 𝑦𝑡 = α1 𝑦𝑡−1 + α2 𝑦𝑡−2 + ⋯ +α𝑝 𝑦𝑡−𝑝 + 𝜀𝑡
𝛽1 𝜀𝑡−1 + ⋯ + 𝛽𝑞 𝜀𝑡−𝑞
Procedures for model identification
• ▶ Guideline to determine
• p, q for ARMA
14 / 31
ARIMA Model
Auto Regressive Integrated Moving Average
(By Box and Jenkins (1970))
 Linear model for forecasting time series data:
Future values is a linear function of several past observations.
 ARIMA(p, d, q)
Moving average of order q
Integrated differentiation of order d
(Expand to Non-Stationary Time Series)
Auto Regression of order p
15 / 31
SVM (Support Vector Machine)

Proposed by Vladimir N. Vapnik (1995, Rus)
An algorithm (or recipe) for maximizing a
particular mathematical function with respect to
a given collection of data
4 Key Concepts:
 Separating hyperplane
 Maximum-margin hyperplane
 Soft margin
 Kernel function
16 / 31
Separating Hyperplane
f(x,w,b) = sign(w x + b)
denotes +1 w x + b>0
denotes -1 Separating
Hyperplane
(= Classifier)
w x + b<0
17 / 31
Maximum Margin
f(x,w,b) = sign(w x + b)
denotes +1
denotes -1
x+ M=Margin
Width
Support
Vectors are X-
those data
points that
the margin
pushes up
Against
Only Support vectors are used to
specify the separating hyperplane!!
18 / 31
Kernel Function (1/2)

Nonlinear SVMs
 Datasets that are linearly separable with some noise work out
great:
0 x
 But what are we going to do if the dataset is just too hard?
0 x
 How about… mapping data to a higher-dimensional space:
x2
19 / 31
Kernel Function (2/2)

Nonlinear SVMs: Feature Spaces
 General idea: The original input space can always be mapped
to some higher-dimensional feature space where the training
set is separable linearly.
 Definition of Kernel Function: some function that corresponds
to an inner product in some expanded feature space.
Φ: x → φ(x)
20 / 31
Genetic Algorithm
Search & Optimization
Create inintial, random
population
technique (potential solutions)
 By J. Holland, 1975 Evaluate fitness for

each population
 Based on Darwin’s
Principle of Natural
Optimal or "good"
Selection solution found?
Basic operations END
 Crossover
No
 Mutation Selection
or kill population
Crossover
Mutation
21 / 31
Overall Approach (1/2)

Support Vector Machines ARIMA
Random Initial Population
Data
Set
Chromosome 1
Chromosome 2 Initial Model Identification

.. Parameters
.
Model Estimation
Chromosome N No
Nonlinear Yes Is satisfied

Training SVM Model Residual model checking?
Trained SVM Model
Trained SVM Model Trained ARIMA Model

Fitness Evaluation (Nonlinear Forecasting) (Linear Forecasting)
Yes
+
Stop
Criteria?
Software Reliability
No Prediciton
Genetic Operations
Support Vector Machines ARIMA

Data
Set
Chromosome 1
Chromosome 2 Initial Model Identification

.. Parameters
.
Model Estimation
Chromosome N No
Nonlinear Yes Is satisfied

Training SVM Model
Residual model checking?
Trained SVM Model
Trained SVM Model Trained ARIMA Model

Fitness Evaluation (Nonlinear Forecasting) (Linear Forecasting)
Yes
Stop +
Criteria?
Software Reliability
No Prediciton
Genetic Operations
22 / 31
Overall Approach (2/2)

Xt = Lt + Nt
Xt : Time series data
Lt : Linear part of time series data
Nt : Nonlinear part of time series data
After ARIMA model processing, we can get
𝑳 𝒕 , 𝜺𝒕 :
𝐿𝑡 : Predicted value of the ARIMA model
𝜀𝑡 : residual at time t from the linear model
𝜀𝑡 = Xt - 𝐿𝑡
Finally, the residuals (𝜀𝑡 ) will be modeled by
the SVM model with GA (Genetic Algorithm).
23 / 31
ARIMA Process (1/2)
Data
Set
Stationarize input data
- Differencing, determine d
- ACF, PACF checking
Model Identification
Determination of the values of p and q

Parameter Estimation
No - ACF, PACF checking
MA(q) AR(p) ARMA(p,q)
ACF Cuts after q Tails off Tails off
Is satisfied PACF Tails off Cuts after p Tails off
model checking?
MLE (Maximum Likelihood Estimation)

Yes
- Find a set of parameters q1,q2, ..., qk
SW Reliability Prediction to maximize L(q1,q2, ... , qk)=
f(x1,x2, ... , xN;q1,q2, ... , qk)
24 / 31
ARIMA Process (2/2)
Data
Set
Model Identification
Parameter Estimation
No
Residual randomness Check
- Residuals of the well-fitted model
will be random and follow the
Is satisfied
model checking?
normal distribution
Yes - Check ACF and PACF
SW Reliability Prediction
25 / 31
SVM Process (1/2)

o Due to the characteristics of
Chromosome 1
input data (randomness),
Chromosome 2 Initial
random initial population
..
.
Parameters selected
- ex: C, ε, σ
Chromosome N
o Data set is divided into two

Training SVM Model
Nonlinear part: training & testing data
Residual
Trained SVM Model
Fitness Evaluation Trained SVM Model

(Nonlinear
Forecasting)
Yes
Stop
Criteria?
No
Genetic Operations
26 / 31
SVM Process (2/2)

Chromosome 1
o The higher fitness value, the
Chromosome 2 Initial more survivability ability
..
.
Parameters
o The high-fitness valued
Chromosome N
candidate chromosome
retained, & combined to
produce new offspring.
Nonlinear
Training SVM Model Residual
o GA is applied to SVM parameter

Trained SVM Model
search
- No theoretical method for
Fitness Evaluation Trained SVM Model determining a kernel function
(Nonlinear
Forecasting) and its parameter
Yes - No a priori knowledge for
setting kernel parameter C.
Stop
Criteria?
o Applied GA operations
- Crossover operation
No
Genetic Operations - Mutation operation

27 / 31
Introduction Background Overall Approach Detailed Process Experimental Results
Experimental Results Conclusion Discussion
Experimental Results (1/2)

Collected data: cumulative number of failures, 𝑥𝑖 , at time 𝑡𝑖
 Data Set (DS-1)
• RADC (Rome Air Development Center) Project reported by Musa
• 21 weeks tested, 136 observed failures
Output: predicted value, 𝑥𝑖+1 , using (𝑥1 , 𝑥2 ,…, 𝑥𝑖 )
Goodness of fit curves Relative Error curves

28 / 31
Introduction Background Overall Approach Detailed Process Experimental Results
Experimental Results Conclusion Discussion
Experimental Results (1/2)

Collected data: cumulative number of failures, 𝑥𝑖 , at time 𝑡𝑖
 Data Set (DS-2)
• 28 weeks SW test, 234 observed failures
Output: predicted value, 𝑥𝑖+1 , using (𝑥1 , 𝑥2 ,…, 𝑥𝑖 )
Goodness of fit curves Relative Error curves

29 / 31
Conclusion
Proposed hybrid methodology in forecasting
software reliability:
 exploits unique strength of the ARIMA model and
the SVM model
Test results
 showed improvement of the prediction performance
30 / 31
Discussion
Pros
 Providing a possible solution of SRM selection
difficulties
 Improving SW reliability prediction performance
Cons
 Not present detailed test methods (ex: stop criteria
for SVM, parameter estimation criteria for ARIMA,
etc.)
31 / 31
Thank you!

A Data-Driven Model For Software Reliability Prediction

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Data-Driven Model For Software Reliability Prediction

Uploaded by

Copyright:

Available Formats

A Data-Driven Model for

Software Reliability Prediction

Young Taek Kim

Data Driven Model

Limitations of Analytical Models

Data Driven Models

60 μ1, σ12, γ1 ≠ μ2, σ22, γ2 60

ACF (Autocorrelation Function)

PACF (Partial ACF)

3 Prediction Models for Stationary Data

AR • Use past values in forecast

• Use past residuals (random events) in

ARMA • Combination of AR & MA

AR (Auto Regressive) Model (1/2)

 ACF decreasing exponentially

 PACF identifying the order

MA (Moving Average) Model (1/2)

MA (Moving Average) Model (2/2)

SVM (Support Vector Machine)

Kernel Function (1/2)

 But what are we going to do if the dataset is just too hard?

Kernel Function (2/2)

 By J. Holland, 1975 Evaluate fitness for

Basic operations END

Overall Approach (1/2)

Chromosome 2 Initial Model Identification

Nonlinear Yes Is satisfied

Trained SVM Model

Trained SVM Model Trained ARIMA Model

Support Vector Machines ARIMA

Chromosome 2 Initial Model Identification

Nonlinear Yes Is satisfied

Trained SVM Model

Trained SVM Model Trained ARIMA Model

Overall Approach (2/2)

ARIMA Process (1/2)

Determination of the values of p and q

MLE (Maximum Likelihood Estimation)

ARIMA Process (2/2)

SVM Process (1/2)

o Data set is divided into two

Trained SVM Model

Fitness Evaluation Trained SVM Model

SVM Process (2/2)

o GA is applied to SVM parameter

Genetic Operations - Mutation operation

Experimental Results (1/2)

Output: predicted value, 𝑥𝑖+1 , using (𝑥1 , 𝑥2 ,…, 𝑥𝑖 )

Goodness of fit curves Relative Error curves

Experimental Results (1/2)

Output: predicted value, 𝑥𝑖+1 , using (𝑥1 , 𝑥2 ,…, 𝑥𝑖 )

Goodness of fit curves Relative Error curves

You might also like