You are on page 1of 77

Hybrid Intelligence Systems For Data

Imputation
Chandan Gautam
(12MCMB03)
Under the guidance of
Prof. V. Ravi

Outline

Outline

Problem Statement
Missing Data and their causes
Data Imputation
Literature Survey
Proposed Method
Results
Conclusions
References
2

Problem Statement

Problem Statement
Developing Hybrid Intelligence Systems for Data Imputation
Based on Statistical and Machine Learning Techniques.

What is missing data ?

What is missing data ?


In the real world scenario,
missing data is an inevitable
and common problem in
various disciplines.
It circumscribes the ability of
researchers to obtain any
conclusion, even if we will get
result by deleting missing data
then result may have biased and
inappropriate.
So, the missing values have to
be imputed.

Age

Salary

Incentive

25

4000

??

??

500

27

??

50

82

2000

150

42

6500

1000

Literature Survey

Literature Survey*
N. Ankaiah, V. Ravi, A novel soft computing hybrid for data
imputation, In Proceedings of the 7th International Conference
On Data Mining (DMIN), Las Vegas, USA, 2011.
Mistry, J., Nelwamondo, F., V., & Marwala, T. (2009). Data estimation
using principal component analysis and Auto associative neural
networks, Journal of Systemics, Cybernetics and Informatics, Volume 7,
pp. 72-79 .
I. B. Aydilek, A. Arslan, A hybrid method for imputation of missing
values using optimized fuzzy c-means with support vector regression
and a genetic algorithm, Information Sciences, vol. 233, pp. 25-35,
2013.
Shichao Zhang, Nearest neighbor selection for iteratively kNN
imputation, The Journal of Systems and Software (2012), vol. 85(11),
pp. 2541-2552.

Mean Imputation

Creating Missing Values and Mean Imputation


Age

Salary

Incentive

Age

Salary

Incentive

25

4000

200

25

4000

??

34

500

??

500

27

1000

50

27

??

50

82

2000

150

82

2000

150

42

6500

1000

42

6500

1000

Mean Imputation :

44

3250

300

Initially, No Missing Data


7

Mean Imputation
MAPE

Compute

the mean absolute percentage error (Flores,1986)


(MAPE) value:

100 n
MAPE

n i 1

x x
x
i

Where,
n - Number of missing values in a given dataset.
i - Predicted by the Mean Imputation for the missing
x
value.

xi - Actual value.
8

Mean Imputation
Result of Mean Imputation

Average MAPE value over 10 fold Mean Imputation


Auto mpg
Body fat
Boston Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish
UK bankruptcy
UK Credit
Wine

Mean Imputation
59.7
11.61
37.77
24.728
23.57
24.022
55.53
14.85
66.007
37.07
28.43
29.99

Error value is too high for


most of the datasets.
So, we have need some
other methods.

Proposed Methods
Module I
PCA-AAELM Imputation
ECM-Imputation
ECM-AAELM Imputation

Module II
PSO-ECM- Imputation
PSO-ECM + ECM-AAELM

Module III
CPAANN Imputation
Gray+PCA-AAELM
Gray+CPAANN

10

Overview of ELM

Overview of Extreme Learning Machine (ELM)

11

Overview of ELM
Architecture of ELM

Architecture of ELM *
x

Output of hiddenTraining
nodes :
g(ai x + bi)
H=g(a.x)
ai : the weight vector
of the connection
=?
th
between the i hidden
node and the input
H. =O

nodes.
H thO
bi : the threshold of the i hidden node.
Output of SLFNs :

m Testing

( x) g(a i x bi )
i
iH_T=g(y.a)
1

Output=H_T .
i : the weight vector of the connection

Output Weight : H T

H is Moore-Penrose inverse.

between the ith hidden node and the


output nodes.

12

Table of Activation Functions

Table of Activation Functions *

13

Proposed Method
Architecture of AAELM

Architecture of AAELM (Autoassociative ELM)

Auto encoders are


feed forward neural
networks trained to
recall the input
space.
14

Ensembled-AAELM
Ensembling of AAELM

Ensembling of AAELM
Run AAELM 10 times independently on same dataset to
generate AAELF.
Use three different probability distribution functions
(Uniform, Normal and Logistic distributions) to generate
weight and two different activation functions (Sigmoid and
Gaussian) at hidden layer.
AAELM ensemble for total six combinations of probability
distribution and activation functions.

15

Ensembled-AAELM
Result of Ensembled AAELM

Average MAPE value over 10 folds Ensembled AAELM *

16

Problems and Solutions of Ensembled AAELM

Drawbacks of AAELM

Dependency of AAELM on randomness is very high and


significant because each run of ELM yields different results.
Result could be fluctuate wildly sometimes.

Remedy of Above Problem

We proposed two new hybrid methods to stabilize


randomness of AAELM :
PCA-AAELM
ECM-AAELM
17

PCA-AAELM

Proposed Method 1:
PCA-AAELM

18

PCA-AAELM
Architecture of PCA-AAELM

Architecture of PCA-AAELM *
Traditional ELM

19

PCA-AAELM
Results

Average MAPE value over 10 folds - PCA-AAELM *

20

ECM-Imputation

Proposed Method 2:
Evolving Clustering method (ECM)
based Imputation

21

ECM-Imputation

Block Diagram of the Proposed Method


Complete
Dataset with
Missing
Values

ECM
Clustering
Incomplete

Find Nearest
Cluster Center from
Incomplete Records

Impute Incomplete Features with


Corresponding Features of the Nearest
Cluster center

Obtained
Cluster Centers

Dataset
without Missing
Values

ECM-Imputation

How to calculate missing values by the help of cluster centers ?


(2 0)2 (3 2)2 5
0

(2 3)2 (3 1)2 5
3

(2 1)2 (3 2)2 2
5

(2 5) (3 3) 9
2

(2 1)2 (3 9)2 37

23

ECM-Imputation
Results

Average MAPE value over 10 folds - ECM-Imputation *


Mean

Auto mpg
Body fat
Boston Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish
UK bankruptcy
UK Credit
Wine

59.7
11.61
37.77
24.728
23.57
24.022
55.53
14.85
66.007
37.07
28.43
29.99

K-Means+MLP
ECM Imputation
[Ankaiah & Ravi]
23.75
18.03
7.83
6.31
21.01
17.84
26.61
22.29
9.41
5.27
29.7
27.16
39.91
31.98
12.14
10.21
33.01
27.90
30.96
46.14
32.17
27.40
21.58
15.61

24

ECM-AAELM

Proposed Method 3:
ECM-AAELM

25

ECM-AAELM
Architecture of ECM-AAELM

Architecture of ECM-AAELM *
Traditional ELM

26

ECM-AAELM
Results

Average MAPE value over 10 folds - ECM-AAELM

27

PCA/ECM-AAELM
Behavior of PCA/ECM-AAELM on different activation functions
80

PCA-AAELM

70

Sigmoid

60

Sinh

50

Cloglogm

40

Bsigmoid
Sine

30

Hardlim

20

Tribas

10

Radbas

0
Auto mpg Boby fat

Boston Forest fires


Housing

Iris

Prima
indian

Spanish

Spectf

Turkish

UK
UK Credit
bankruptcy

Wine

ECM-AAELM

Softplus

Sigmoid

70

sinh
Cloglogm

60

Bsigmoid
50

Sin

40

Hardlim
Tribas

30

Radbas

20

Softplus

Gaussian

10

Rectifier
0
Auto mpg

Body fat

Boston Forest fires


Housing

Iris

Prima
Indian

Spanish

Spectf

Turkish

UK
UK Credit
bankruptcy

Wine

28

ECM-AAELM
Influence of Dthr value on MAPE results : ECM-AAELM

Influence of Dthr value on MAPE results : ECM-AAELM


250
Auto_MPG
200

Body_Fat
Boston_housing
Forest_Fire

150

Iris
Prima_indian

Spanish

100

Turkish
Spectf
50

UK_Credit
UK_Bankruptcy

Wine
Dthr
0.035
0.07
0.105
0.14
0.175
0.21
0.245
0.28
0.315
0.35
0.385
0.42
0.455
0.49
0.525
0.56
0.595
0.63
0.665
0.7
0.735
0.77
0.805
0.84
0.875
0.91
0.945
0.98

29

ECM-AAELM

Module II:
Proposed Method 4:
PSO-ECM

30

Block Diagram of the Proposed Method


2
Dataset contains
incomplete records

Complete Records

3
Initialize PSO parameter
and Apply ECM with
initialized Dthr value

4
ECM imputation based on nearest
cluster center

Incomplete Records

Compute Covariance matrix for complete


records (Ccov) and total records (Tcov) after
imputation and Determinant of Ccov & Tcov

Compute MSE b/w Ccov & Tcov and absolute difference


b/w Det(Ccov) & Det(Tcov )

Invoke PSO to select Dthr value

Apply ECM with Dthr value yielded by


PSO

Is error
minimum ?

Parameter Optimized
ECM imputation with optimized Dthr
value

Dataset does not contain


incomplete records

31

ECM-Imputation
Results

Average MAPE value over 10 fold PSO-ECM based Imputation *


Proposed Techniques
Mean
Auto mpg
Body fat
Boston Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish
UK bankruptcy
UK Credit
Wine

59.7
11.61
37.77
24.728
23.57
24.022
55.53
14.85
66.007
37.07
28.43
29.99

K-Means+MLP
ECM-Imputation
[Ankaiah & Ravi]
23.75
18.03
7.83
6.31
21.01
17.84
26.61
22.29
9.41
5.27
29.7
27.16
39.91
31.98
12.14
10.21
33.01
27.9
30.96
46.14
32.17
27.4
21.58
15.61

PSO-ECM
15.34844
4.96008
14.49978
18.33909
4.82263
24.57587
20.73123
9.85382
19.28137
30.97627
24.61695
12.75819

32

Proposed Method 5:
PSO-ECM + ECM-AAELM

33

PSO-ECM + ECM-AAELM

Proposed Model

34

PSO-COV-ECM + ECM_AAELM
Results

Average MAPE value over 10 fold - PSO-COV-ECM + ECM_AAELM

35

PSO-ECM + ECM-AAELM
Comparison

Compare the Results after and before selection of optimal Dthr value

Mean
Auto mpg
Body fat
Boston Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish
UK bankruptcy
UK Credit
Wine

59.7
11.61
37.77
24.728
23.57
24.022
55.53
14.85
66.007
37.07
28.43
29.99

K-Means+MLP
[Ankaiah & Ravi]
23.75
7.83
21.01
26.61
9.41
29.7
39.91
12.14
33.01
30.96
32.17
21.58

ECM-AAELM

PSO-ECM +
ECM-AAELM

Before

After

17.38
5.33
16.48
21.54
5.10
23.95
22.09
8.05
21.49
40.06
26.85
14.88

14.69
4.64
14.44
18.17
4.83
23.96
18.53
8.18
18.97
28.66
24.79
12.60

36

CPAANN

Module III:
Proposed Method 6:
CPAANN

37

CPAANN
Introduction of CPNN

Introduction *
Semi-supervised Learning :
Unsupervised

CP NN

Kohonen
SOM
competitive
learning

Supervised

Grossberg
Outstar

Added the concept of auto-associativity in CPNN and


created Counter Propagation Auto-associative Neural
Network (CPAANN)
38

Comparison

Average MAPE value over 10 fold - CPAANN


Mean

Auto mpg
Body fat
Boston Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish
UK bankruptcy
UK Credit
Wine

59.7
11.61
37.77
24.728
23.57
24.022
55.53
14.85
66.007
37.07
28.43
29.99

K-Means+MLP CPAANN based


[Ankaiah & Ravi]
Imputation
23.75
18.32
7.83
5.25
21.01
14.86
26.61
16.97
9.41
6.51
29.7
18.21
39.91
17.13
12.14
8.61
33.01
16.07
30.96
21.96
32.17
22.88
21.58
11.56

39

PCA-AAELM

Proposed Method 7:
Gray + PCA-AAELM

40

Proposed Method*:
Stage I

Gray Distance
Based
Nearest
Neighbor
Imputation

Stage II

PCA-AAELM
Based
Imputation

41

Gray+PCA-AAELM
Comparison

Results of PCA-AAELM with Mean Imputation and Gray Distance based Imputation
Mean
Auto mpg
Body fat
Boston Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish
UK bankruptcy
UK Credit
Wine

59.7
11.61
37.77
24.728
23.57
24.022
55.53
14.85
66.007
37.07
28.43
29.99

PCA-AAELM with
K-Means+MLP
Gray Distance
Mean Imputation
[Ankaiah & Ravi]
based Imputation
23.75
28.63
16.92
7.83
6.01
5.41
21.01
20.9
17.46
26.61
19.41
20.89
9.41
10.23
5.79
29.7
22.06
22.03
39.91
30.09
28.06
12.14
9.11
8.38
33.01
30.18
27.38
30.96
37.7
37.95
32.17
25.27
27.79
21.58
16.6
14.78

42

Gray+CPAANN

Proposed Method 8:
Gray + CPAANN

43

Gray+CPAANN

Proposed Method*:
Stage I

Gray Distance
Based
Nearest
Neighbour
Imputation

Stage II

CPAANN
Based
Imputation

44

Gray+CPAANN
Comparison

Results of CPAANN with Mean Imputation and Gray Distance based Imputation
Mean
Auto mpg
Body fat
Boston Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish
UK bankruptcy
UK Credit
Wine

59.7
11.61
37.77
24.728
23.57
24.022
55.53
14.85
66.007
37.07
28.43
29.99

CPAANN with
K-Means+MLP
Gray Distance
Mean Imputation
[Ankaiah & Ravi]
based Imputation
23.75
18.32
15.31
7.83
5.25
4.71
21.01
14.86
15.01
26.61
16.97
17.91
9.41
6.51
4.03
29.7
18.21
19.34
39.91
17.13
14.21
12.14
8.61
8.53
33.01
16.07
17.37
30.96
21.96
20.58
32.17
22.88
13.70
21.58
11.56
11.72

45

Comparison Between All Methods


Comparison

Comparison Between All Proposed Methods based on Average MAPE value over 10
folds
PCAAAELM

ECM_Imp
utation

ECMAAELM

Auto mpg
Body fat
Boston
Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish

28.63
6.01

18.03
6.31

17.38
5.33

20.90
19.41
10.23
22.06
30.09
9.11
30.18

17.84
22.29
5.27
27.16
31.98
10.21
27.90

16.48
21.54
5.10
23.95
22.09
8.05
21.49

UK bankruptcy
UK Credit
Wine

37.70
25.27
16.60

46.14
27.40
15.61

40.06
26.85
14.88

PSOGray+PCA
Gray+CPA
PSO-ECM ECM+ECMCPAANN
-AAELM
ANN
AAELM

15.35
4.96
14.50
18.34
4.82
24.58
20.73
9.85
19.28
30.98
24.62
12.76

14.39
4.61

16.92
5.41

18.32
5.25

15.31
4.71

14.18
17.66
4.75
23.38
16.99
8.18
16.49

17.46
20.89
5.79
22.03
28.06
8.38
27.38

14.86
16.97
6.51
18.21
17.13
8.61
16.07

15.01
17.91
4.03
19.34
14.21
8.53
17.37

26.89
23.66
12.21

37.95
27.79
14.78

21.96
22.88
11.56

20.58
13.70
11.72

46

Conclusions

Conclusions

47

Conclusion

Conclusions

The results indicated that all the proposed methods provided significantly
improved results compare to K-Means +MLP.
ECM-Imputation alone outperformed K-Means +MLP. It showed powerful
local learning capability of ECM.
ECM-AAELM yields more accuracy than PCA-AAELM.
Output of ECM-AAELM primarily depends on threshold value of ECM, its
output does not fluctuate wildly according to activation functions.
Based on our experiment, it is proved that selection of optimal Dthr value
always performed better imputation.
In case of PCA-AAELM, it is recommended to use Softplus activation
function because it performed better than other activation functions.
Gray Distance based imputation performed better than Mean imputation as
preprocessing task for most of the dataset.

48

Papers

List of Published and Communicated Research Papers


C. Gautam, V. Ravi, Evolving Clustering Based Data Imputation, 3rd
IEEE Conference, ICCPCT, Kanyakumari, Mar 21-22, 2014.
C. Gautam, V. Ravi, Data Imputation via Evolutionary Computation,
Clustering and a Neural Network, to be communicated in IEEE
Computational Intelligence Magazine (CIM).
A Hybrid Data Imputation method based on Gray System Theory and
Counterpropagation Auto-associative Neural Network, to be
communicated in Neurocomputing.
Imputation of Missing Data Using PCA, Extreme Learning Machine
and Gray System Theory, to be communicated in The 5th Joint
International Conference on Swarm, Evolutionary and Memetic
Computing (SEMCCO 2014).

49

References
Data Imputation

References

Abdella, M., & Marwala, T. (2005). The use of genetic algorithms and neural
networks to approximate missing data in database, IEEE 3rd International
Conference on Computational Cybernetics, Mauritius, pp. 207-212.
Mistry, J., Nelwamondo, F., V., & Marwala, T. (2009). Data estimation using
principal component analysis and Auto associative neural networks, Journal of
Systemics, Cybernetics and Informatics, Volume 7, pp. 72-79 .
Ankaiah, N., & Ravi, V. (2011). A novel soft computing hybrid for data
imputation, International Conference on Data Mining, Las Vegas, USA.
Vriens, M., & Melton, E. (2002). Managing missing data. Marketing Research,
Volume 14, Issue 3, pp.1217.
Naveen, N., Ravi, V., & Rao, C. R. (2010). Differential evolution trained radial
basis function network: application to bankruptcy prediction in banks, International
Journal of Bio-Inspired Computation (IJBIC), Volume 2, Issue 3, pp. 222-232.50

References
Data Imputation (Cont.)

Nelwamondo, F., V., Golding, D., & Marwala, T. (2013). A dynamic programming
approach to missing data estimation using neural networks, Elsevier, Information
Sciences, Volume 237, pp. 4958.
Nishanth, K.J., Ankaiah, N., Ravi, V., & Bose, I. (2012). Soft computing based
imputation and hybrid data and text mining: The case of predicting the severity of
phishing alerts, Expert Systems with Applications, Volume 39, Issue 12, pp. 1058310589.
K. J. Nishanth, V. Ravi, A Computational Intelligence Based Online Data
Imputation Method: An Application For Banking, Journal of Information
Processing Systems, vol. 9 (4), pp. 633-650, 2013.
M. Krishna, V. Ravi, Particle swarm optimization and covariance matrix based data
imputation, IEEE International Conference on Computational Intelligence and
Computing Research (ICCIC), Enathi, 2013.
V. Ravi, M. Krishna, A new online data imputation method based on general
regression auto associative neural network, Neurocomputing, vol. 138, pp. 207212, 2014.

51

References
Extreme Learning Machine (ELM)

Huang, G.B., Zhu, Q., & Siew, C. (2006). Extreme Learning Machine: Theory and
Applications, Neurocomputing, Elsevier, 7th Brazilian Symposium on Neural
Networks, Volume 70, pp. 489-501.
Rajesh, R., & Siva, J. (2011). Extreme Learning Machine A Review and State of
Art, International Journal Of Wisdom Based Computing, Volume 1, pp. 35-49.
Huang, G., Wang, D., & Lan, Y. (2011). Extreme Learning Machine: A Survey,
International Journal of Machine Learning and Cybernetics June 2011, Volume
2, Issue 2, pp 107-122.
Bartlett, P. (1997). For Valid Generalization, The Size of the Weights is more
important than the Size of the Network, Advances in Neural Information Processing
Systems, Volume 9, pp. 134-140.
Huang, G., Chen, L., & Siew, C. (2006). Universal Approximation Using
Incremental Constructive Feedforward Networks with Random Hidden Nodes,
IEEE Transactions on Neural Networks, Volume 17, Issue 4, pp. 879-892.

52

References
Extreme Learning Machine (ELM) (Cont.)

Zhu, Q., Qin, A. K., Suganthan, P.N., & Huang, G. (2005). Evolutionary Extreme
Learning Machine, Pattern Recognition, Elsevier, Volume 38, Issue 10, pp. 1759
1763.
Castao, A., Fernndez-Navarro, F., & Hervs-Martnez, C. (2013). PCA-ELM -A
Robust and Pruned ELM Approach Based on PCA, Neural Processing Letter,
Springer, Volume 37, Issue 3, pp. 377-392.
Huang, G.B., Zhou, H., Ding, X., & Zhang, R. (2012). Extreme Learning Machine
for Regression and Multiclass Classification, IEEE Transaction on Systems, Man
And Cybernetics, Volume 42, Issue 2, pp. 513-529.

53

References
Extreme Learning Machine (In Finance)

Duan, G., Huang, Z., & Wang, J. (2009). Extreme Learning Machine for Bank
Clients Classification, International Conference on Information Management,
Innovation Management and Industrial Engineering, Xian, China, Volume 2, pp.
496-499.
Duan, G., Huang, Z., & Wang, J. (2010). Extreme Learning Machine for Financial
Distress Prediction for Listed Company, International Conference on Logistics
Systems and Intelligent Management, Harbin, China, Volume 3, pp. 1961-1965.
Zhou, H., Lan, Y., Soh, Y. C., Huang G.B., & Zhang, R. (2012). Credit Risk
Evaluation with Extreme Learning Machine, IEEE International Conference on
Systems, Man and Cybernetics, Seoul, Korea, pp. 1064-1069.
Teresa, M., Carmen, M., David B., & Jos, F. (2012). Extreme Learning Machine
to Analyze the Level of Default in Spanish Deposit Institutions, Journal of
Methods for the quantitative Economy and Enterprise, Volume 13, Issue 1, pp. 323.

54

References
Activation Function
Sibi,

p., Jones, s., & Siddarth, p. (2013). Analysis of Different Activation Functions
Using Back Propagation Neural Networks, Journal of Theoretical and Applied
Information Technology, Volume 47, Issue 3, pp. 1264-1268.
Peng, J., Li, L., & Tang (2013). Combination of Activation Functions in Extreme
Learning Machines for Multivariate Calibration, Chemometrics and Intelligent
Laboratory Systems, Elsevier, Volume 120, pp. 53-58.
Gomes, G. S. S., Ludermir, T. B., & Lima, L. M. M. R. (2011). Comparison of new
activation functions in neural network for forecasting financial time series, Neural
Computing and Applications, Springer, Volume 20, Issue 3, pp. 417-439.
Asaduzzaman, Md., Shahjahan, M., & Murase, K. (2009). Faster Training Using
Fusion of Activation Functions for Feed Forward Neural Networks, International
Journal of Neural Systems , Volume 19, Issue 06, pp. 437-448 .
Karlik, B., & Olgac, A. V. (2010) Performance Analysis of Various Activation
Functions in Generalized MLP Architectures of Neural Networks, International Journal
of Artificial Intelligence and Expert Systems, Volume 1, Issue 4, pp. 111-122. 55

References
Activation Function(Cont.), ECM, Cross Validation & PCA

Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep Sparse Rectifier Neural
Networks, International Conference on Artificial Intelligence and Statistics, Fort
Lauderdale, USA, Volume 15, pp. 315-323.
Song, Q. & Kasasbov, N. (2001) ECM A Novel On-line, Evolving Clustering
Method and Its Applications, Proceedings of the Fifth Biannual Conference on
Artificial Neural Networks and Expert Systems, Berlin, pp. 87-92.
Refaeilzadeh, P., Tang, L., & Liu. H. (2009). "Cross Validation", in Encyclopedia
of Database Systems (EDBS), Springer, Volume 1, pp. 532-538.
Smith, L. (2002). A tutorial on Principal Components Analysis.

56

References
Counter Propagation Neural Network (CPAANN)

Kuzmanovski, I., & Novi, M. (2008). Counter-propagation neural networks in


Matlab, Chemometrics and Intelligent Laboratory Systems, pp. 84-91.
Taner, M. (1997). Kohonens self organizing networks with CONSCIENCE.
Ballabio, D., & Vasighi, M. (2012). MATLAB toolbox for Self Organizing Maps and
supervised neural network learning strategies, Chemometrics and Intelligent
Laboratory Systems, pp. 24-32.
Ballabio, D., Consonni, V., & Todeschini, R. (2009). The Kohonen and CP-ANN
toolbox: A collection of MATLAB modules for Self Organizing Maps and
Counterpropagation Artificial Neural Networks, Chemometrics and Intelligent
Laboratory Systems, pp. 115-122.
Introduction to neural networks Using MATLAB 6.0 by S N Sivanandam, S Sumathi
and S N Deepa.
Elements of Artificial Neural Networks by Kishan Mehrotra, Chilukuri K. Mohan and
Sanjay Ranka .

57

Thank You

Thank You

58

Activation Function

Activation Function *
Sibi, p., Jones, s., & Siddarth, p. (2013). Analysis of Different
Activation Functions Using Back Propagation Neural Networks, Journal
of Theoretical and Applied Information Technology, Volume 47, Issue
3, pp. 1264-1268.
Gomes, G. S. S., Ludermir, T. B., & Lima, L. M. M. R. (2011).
Comparison of new activation functions in neural network for
forecasting financial time series, Neural Computing and Applications,
Springer, Volume 20, Issue 3, pp. 417-439.

59

Activation Function

Activation Function (Cont.)


Karlik, B., & Olgac, A. V. (2010) Performance Analysis of Various
Activation Functions in Generalized MLP Architectures of Neural
Networks, International Journal of Artificial Intelligence and Expert
Systems, Volume 1, Issue 4, pp. 111-122.
Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep Sparse Rectifier
Neural Networks, International Conference on Artificial Intelligence
and Statistics, Fort Lauderdale, USA, Volume 15, pp. 315-323.

60

Experimental Design

Experimental Design for PCA-AAELM and ECM-AAELM


10 fold cross validation has been used in our experiment.
Both

PCA-AAELM and ECM-AAELM have one user


defined parameter, PCA has variance i.e. eigen values and
ECM has threshold value.
We fixed activation function and varied variance from 1 to 99
in PCA-AAELM and threshold from 0.001 to 0.999 in ECMAAELM for each activation function on whole dataset.
We used 11 activation functions and compare their
performances.
61

PCA-AAELM
Steps of PCA-AAELM

Following steps are required for training process :*


Training Dataset

Perform the PCA

Selection of optimal
number of hidden
nodes and value of
hidden node as input
weight

PC * Training Data

Neural Network
Model

Compute
the
output weight by
performing MoorePenrose generalized
inverse

Perform the no-linear


transformation

62

ECM-AAELM
Evolving Clustering Method
x3
x1
x1

C10 R10 =0

Evolving Clustering Method *

C20 R20 =0

x4

x2
R11

C11

x8

C30

C30 R30 =0

x7

C21

C21

x6

x9

C13

C12

x5

63

ECM-AAELM
Steps of ECM-AAELM

Following steps are required for training process : *


1) First; perform ECM on given dataset and find how many
clusters will be generated.
2) Then extract centre of each cluster and assume each cluster
as a hidden node. Therefore, number of hidden nodes are
equal to number of generated clusters.
3) Calculate normalized Euclidean distance from centre of each
cluster, which is presented as hidden nodes.
4) Performs non-linear transformation by activation function on
this distance to get hidden node output.
64

ECM-AAELM
Steps of ECM-AAELM (Cont.)

5) Normalized Euclidean distance formula is:


x y

x
q

i 1

Where q=number of features.


6) After this perform Moore-Penrose generalized inverse on
output of previous step and multiply by dataset to calculate
output weight.
7) In last, multiply output weight to hidden node output to get
final output.
65

ELM
Why Moore-Penrose Inverse

Why are we using Moore-Penrose Generalized Inverse *


Moore- Penrose provides solution of a linear system
Ax=y
in such a way that
error = Ax-y and x
both will be minimized simultaneously and gives a unique
solution :
x=H y
Formula : H = (HT H)-1 HT
66

Flow of CPNN Algorithm

Initialize Network

N epochs

Get Input
Repeat for
all inputs

Find Winner
Update Winner &
neighbourhoods
Update nodes at Grossberg
Outstar

67

Architecture of Forward only CPNN


Input

Weights trained by
simple competitive
learning

Hidden

Weights trained by
Outstar rule

Output

x1

h1

y1

x2

h2

y2

xm

hn

yp

68

How weights are updating ?

Fig. 9 red color


Hidden Nodes
and 10 blue color
input samples

69

How to Calculate Gray Distance*:


Gray Relational Coefficient :
mis
GRC ( x kp
, xi )
mis
mis
min i min p | x kp
xip | max i max p | x kp
xip |

|x

mis
kp

xip | max i max p | x

mis
kp

xip |

p 1,2,3,......, m.
k 1,2,3,......, n.
i 1,2,3,......., o.
0 1 Control the level of differences with respect to the relational coefcient.
Gray Relational Grade :
GRG ( x

mis
k

1
, xi )
m

GRC ( x
p 1

mis
kp

, xi )

i 1,2,......, o.
k 1, 2,....., n.

70

Example *
attr1

attr2

attr3

attr4

attr5

R1

0.2

0.9

0.6

0.5

R2
R3
R4
R5

0.1
0.1
0.8
0.5

0.3
0.4
0.2
0.8

0.9
0.8
0.5
0.3

0.4
0.5
0.3
0.9

0.6
0.6
0.2
0.7

Abs.
Diff1
0.1
0.1
0.6
0.3

Abs.
Diff2
0
0.1
0.4
0.6

Abs.
Diff3
0.2
0.1
0.3
0.3

Abs.
Diff4
0.1
0.1
0.3
0.2

Min

Max

0
0.1
0.3
0.2

0.2
0.1
0.6
0.6

GRC1

GRC2

GRC3

GRC4

GRG

R2

0.75

0.6

0.75

0.775

R3

0.75

0.75

0.75

0.75

0.75

Actual value = 0.3

0.5

0.5

0.440476

0.5

0.6

0.483333

Imputation by Gray
Distance = 0.3

R4
R5

0.333333 0.428571
0.5

0.333333

Min= 0

Max=0.6

71

Gray Distance Based Imputation


Results

Average MAPE value over 10 fold - Gray Distance Based Imputation *

Auto mpg
Body fat
Boston Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish
UK bankruptcy
UK Credit
Wine

Mean

K-Means+MLP

59.7
11.61
37.77
24.728
23.57
24.022
55.53
14.85
66.007
37.07
28.43
29.99

23.75
7.83
21.01
26.61
9.41
29.7
39.91
12.14
33.01
30.96
32.17
21.58

Gray Distance Based


Imputation
16.73
7.65
19.28
22.89
5.34
28.06
36.29
11.60
36.63
39.75
28.90
17.58

72

Literature Survey

73

Data Imputation

Data Imputation

Vriens, M., & Melton, E. (2002). Managing missing data. Marketing Research,
Volume 14, Issue 3, pp.1217.
Mistry, J., Nelwamondo, F., V., & Marwala, T. (2009). Data estimation using
principal component analysis and Auto associative neural networks, Journal of
Systemics, Cybernetics and Informatics, Volume 7, pp. 72-79 .
Ankaiah, N., & Ravi, V. (2011). A novel soft computing hybrid for data
imputation, International Conference on Data Mining, Las Vegas, USA.
Nishanth, K.J., Ankaiah, N., Ravi, V., & Bose, I. (2012). Soft computing based
imputation and hybrid data and text mining: The case of predicting the severity
of phishing alerts, Expert Systems with Applications, Volume 39, Issue 12, pp.
10583-10589.
M. Krishna, V. Ravi, Particle swarm optimization and covariance matrix
based data imputation, IEEE International Conference on Computational
Intelligence and Computing Research (ICCIC), Enathi, 2013.

74

Extreme Learning Machine (ELM)

Extreme Learning Machine (ELM)

Huang, G.B., Zhu, Q., & Siew, C. (2006). Extreme Learning Machine:
Theory and Applications, Neurocomputing, Elsevier, 7th Brazilian
Symposium on Neural Networks, Volume 70, pp. 489-501.
Huang, G., Wang, D., & Lan, Y. (2011). Extreme Learning Machine: A
Survey, International Journal of Machine Learning and Cybernetics June
2011, Volume 2, Issue 2, pp 107-122.
Rajesh, R., & Siva, J. (2011). Extreme Learning Machine A Review and
State of Art, International Journal Of Wisdom Based Computing, Volume
1, pp. 35-49.
Huang, G.B., Zhou, H., Ding, X., & Zhang, R. (2012). Extreme Learning
Machine for Regression and Multiclass Classification, IEEE Transaction
on Systems, Man And Cybernetics, Volume 42, Issue 2, pp. 513-529.

75

ECM & CPNN

Evolving Clustering Method*

Song, Q. & Kasasbov, N. (2001) ECM A Novel On-line, Evolving


Clustering Method and Its Applications, Proceedings of the Fifth
Biannual Conference on Artificial Neural Networks and Expert Systems,
Berlin, pp. 87-92.

Counter Propagation Neural Network


Kuzmanovski, I., & Novi, M. (2008). Counter-propagation neural
networks in Matlab, Chemometrics and Intelligent Laboratory Systems,
pp. 84-91.
Ballabio, D., Consonni, V., & Todeschini, R. (2009). The Kohonen and CPANN toolbox: A collection of MATLAB modules for Self Organizing Maps
and Counterpropagation Artificial Neural Networks, Chemometrics and
Intelligent Laboratory Systems, pp. 115-122.
Sivanandam, S. N., & Deepa, S. N. Introduction to neural networks Using
MATLAB 6.0.

76

Dataset Description
Dataset

Auto mpg
Body fat
Boston Housing
Forest fires
Iris
Prima Indian
Spanish
Spectf
Turkish
UK bankruptcy
UK Credit
Wine

Number of records Number of attributes


392
7
252
14
506
13
516
10
150
4
768
8
66
9
267
44
40
12
60
10
1225
12
178
13

77