You are on page 1of 32

Department of Computer Science & Engineering

Indian Institute of Technology-Madras

KMPA Assignment-I
Group No. 11
Vindhani Mohsin (ED11B041), Rohit R. Salunke (ME11B119), Pritesh Jain (ME11B116)
March 3, 2016

Classification

For all the classification task, the selected error function is Cross-Entropy error with the desired output
being 1 of K representation. The input is normalized by the algorithm before feeding in to the training.
The activation function of the last layer is Sigmoid to bound the outcome of the final layer in 0 to 1.
The model selection was done by running models with different hyper parameters subject to the same
learning algorithm with fixed parameters and fixed maximum number of epochs allowed.

Linearly Separable classes


The selection of the model was done by varying the number of neurons in the hidden layers. The criteria
for selection was the accuracy of the model on the validation set. The experiments done were:
(1,1,3)
(2,2,3)
(3,3,3)
where the first two numbers correspond to the number of neurons in the hidden layer 1 and hidden layer
2 respectively. The last number is the number of outputs.
The models (2,2,3) and (3,3,3) gave full accuracy but the first one being less complex is selected.
Refer table 1 and figure 1

Class 1
Class 2
Class 3
Accuracy

Confusion Matrix
Class 1
Class 2
100
0
0
100
0
0
100%

Class 3
0
0
100

Table 1: Linear 2,2,3

Figure 1: Linear 2,2,3


Selection of Training algorithm
Once the gradients of the weights are obtained, the selection of the algorithm for update is done by
testing it on this task and looking at the number of epochs. The algorithms tried are:
Gradient Descent
Conjugate Gradient Descent
Resilient Backpropogation(rprop)
Gradient Descent with adaptive rate
The best algorithm for training is rprop with the parameters:
initial update =0.7
max weight change allowed =50
learning rate=0.01
Refer figures 2, 3, 4 and 5.

Figure 2: CEE with CG

Figure 3: CEE with GD

Figure 5: CEE with RPROP(10,10,3)

Figure 4: CEE with GDX

Non Linear classes


The selection of the model was done by varying the number of neurons in the hidden layers. The criteria
for selection was the accuracy of the model on the validation set. The experiments done were:
(3,3,3)
(5,5,3)
(8,8,3)
(7,7,3)
where the first two numbers correspond to the number of neurons in the hidden layer 1 and hidden layer
2 respectively. The last number is the number of outputs.
As the last model gave full accuracy on validation and test set, it was selected.
A less complex model was tried to check if the output can be achieved by lesser complexity but by
doing experiments, it was noted that the model did not give full accuracy on all the training iterations.
Hence (8,8,3) was selected.

Class 1
Class 2
Class 3
Accuracy

Confusion Matrix
Class 1
Class 2
60
0
0
120
0
0
100%

Class 3
0
0
160

Table 2: Non-Linear 8,8,3

Figure 6: Non-Linear 8,8,3

Figure 7: Node 1 of layer 1 EpochFigure 8: Node 2 of layer 1 EpochFigure 9: Node 3 of layer 1 Epoch
1
1
1

Figure 10: Node 4 of layer 1 EpochFigure 11:


Epoch 1
1

Node 5 of layer 1Figure 12: Node 6 of layer 1 Epoch


1

Figure 13: Node 7 of layer 1 EpochFigure 14:


Epoch 1
1

Node 8 of layer 1

Figure 15: Node 1 of layer 2 EpochFigure 16:


Epoch 1
1

Node 2 of layer 2Figure 17: Node 3 of layer 2 Epoch


1

Figure 18: Node 4 of layer 2 EpochFigure 19:


Epoch 1
1

Node 5 of layer 2Figure 20: Node 6 of layer 2 Epoch


1

Figure 21: Node 7 of layer 2 EpochFigure 22:


Epoch 1
1

Node 8 of layer 2

Figure 23: Node 1 of output layerFigure 24: Node 2 of output layerFigure 25: Node 3 of output layer
Epoch 1
Epoch 1
Epoch 1

Figure 26: Node 1 of layer 1 EpochFigure 27:


Epoch 2
2

Node 2 of layer 1Figure 28: Node 3 of layer 1 Epoch


2

Figure 29: Node 4 of layer 1 EpochFigure 30:


Epoch 2
2

Node 5 of layer 1Figure 31: Node 6 of layer 1 Epoch


2

Figure 32: Node 7 of layer 1 EpochFigure 33:


Epoch 2
2

Node 8 of layer 1

Figure 34: Node 1 of layer 2 EpochFigure 35:


Epoch 2
2

Node 2 of layer 2Figure 36: Node 3 of layer 2 Epoch


2

Figure 37: Node 4 of layer 2 EpochFigure 38:


Epoch 2
2

Node 5 of layer 2Figure 39: Node 6 of layer 2 Epoch


2

Figure 40: Node 7 of layer 2 EpochFigure 41:


Epoch 2
2

Node 8 of layer 2

Figure 42: Node 1 of output layerFigure 43: Node 2 of output layerFigure 44: Node 3 of output layer
Epoch 2
Epoch 2
Epoch 2

Figure 45: Node 1 of layer 1 EpochFigure 46:


Epoch 10
10

Node 2 of layer 1Figure 47: Node 3 of layer 1 Epoch


10

Figure 48: Node 4 of layer 1 EpochFigure 49:


Epoch 10
10

Node 5 of layer 1Figure 50: Node 6 of layer 1 Epoch


10

Figure 51: Node 7 of layer 1 EpochFigure 52:


Epoch 10
10

Node 8 of layer 1

Figure 53: Node 1 of layer 2 EpochFigure 54:


Epoch 10
10

Node 2 of layer 2Figure 55: Node 3 of layer 2 Epoch


10

Figure 56: Node 4 of layer 2 EpochFigure 57:


Epoch 10
10

Node 5 of layer 2Figure 58: Node 6 of layer 2 Epoch


10

Figure 59: Node 7 of layer 2 EpochFigure 60:


Epoch 10
10

Node 8 of layer 2

Figure 61: Node 1 of output layerFigure 62: Node 2 of output layerFigure 63: Node 3 of output layer
Epoch 10
Epoch 10
Epoch 10

Figure 64: Node 1 of layer 1 EpochFigure 65:


Epoch 50
50

Node 2 of layer 1Figure 66: Node 3 of layer 1 Epoch


50

Figure 67: Node 4 of layer 1 EpochFigure 68:


Epoch 50
50

Node 5 of layer 1Figure 69: Node 6 of layer 1 Epoch


50

Figure 70: Node 7 of layer 1 EpochFigure 71:


Epoch 50
50

Node 8 of layer 1

10

Figure 72: Node 1 of layer 2 EpochFigure 73:


Epoch 50
50

Node 2 of layer 2Figure 74: Node 3 of layer 2 Epoch


50

Figure 75: Node 4 of layer 2 EpochFigure 76:


Epoch 50
50

Node 5 of layer 2Figure 77: Node 6 of layer 2 Epoch


50

Figure 78: Node 7 of layer 2 EpochFigure 79:


Epoch 50
50

Node 8 of layer 2

11

Figure 80: Node 1 of output layerFigure 81: Node 2 of output layerFigure 82: Node 3 of output layer
Epoch 50
Epoch 50
Epoch 50

Figure 83: Node 1 of layer 1 EpochFigure 84:


Epoch 100
100

Node 2 of layer 1Figure 85: Node 3 of layer 1 Epoch


100

Figure 86: Node 4 of layer 1 EpochFigure 87:


Epoch 100
100

Node 5 of layer 1Figure 88: Node 6 of layer 1 Epoch


100

Figure 89: Node 7 of layer 1 EpochFigure 90:


Epoch 100
100

Node 8 of layer 1

12

Figure 91: Node 1 of layer 2 EpochFigure 92:


Epoch 100
100

Node 2 of layer 2Figure 93: Node 3 of layer 2 Epoch


100

Figure 94: Node 4 of layer 2 EpochFigure 95:


Epoch 100
100

Node 5 of layer 2Figure 96: Node 6 of layer 2 Epoch


100

Figure 97: Node 7 of layer 2 EpochFigure 98:


Epoch 100
100

Node 8 of layer 2

Figure 99: Node 1 of output layerFigure 100: Node 2 of outputFigure 101: Node 3 of output layer
layer Epoch 100
Epoch 100
Epoch 100

13

Overlapping classes
It is not possible to obtain full accuracy and hence experiments were done to study the changes in the
accuracy and decision boundary by changing the model complexity.
The experiments done were:
(5,5,3)
(8,8,3)
(12,12,3)
where the first two numbers correspond to the number of neurons in the hidden layer 1 and hidden layer
2 respectively. The last number is the number of outputs.
On increasing the number of neurons, patches started coming up as shown in (12,12,3) and the
accuracy reduced. For the model (5,5,3) , the model could not achieve high accuracy as the complexity
was not enough so the model (8,8,3) is selected as the optimal.
Refer tables 3 and 4 and figures 102 and 103.

Class 1
Class 2
Class 3
Accuracy

Confusion
Class 1
131
7
4

Matrix
Class 2
14
140
8
90.8%

Class 3
5
3
138

Class 1
Class 2
Class 3
Accuracy

Table 3: Overlapping 8,8,3

Confusion
Class 1
135
21
6

Matrix
Class 2
11
125
6
88.4%

Class 3
4
4
138

Table 4: Overlapping 12,12,3

Figure 102: Overlapping 8,8,3

Figure 103: Overlapping 12,12,3

Image Data
Each data point has 48 features. The data was separated in to training(0.7), validation set(0.15) and
test set(0.15).
The experiments done were:
(10,10,5)
(15,15,5)
(20,20,5)
(25,25,5)

14

(30,30,5)
where the first two numbers correspond to the number of neurons in the hidden layer 1 and hidden layer
2 respectively. The last number is the number of outputs.
The accuracy of the model for (25,25,5) was better than (20,20,5) and (30,30,5). Hence it was selected.

Figure 104: Neural Network for Image Data

Figure 105: Classification for image data

Figure 106: Validation graphs

15

Figure 107: Confusion Matrix

Regression

Polynomial curve fitting


Polynomial curve fitting for dataset 1 gave optimal results on x4 complexity after which we noticed some
overfitting (Refer figures 108, 109 and 111). Overfitting could be seen on either edges of the plot 111.
Use of regularization parameter did not have significant effect on the results since the data had 1000
points(Refer figures 109 and 110). Although regularization VS MSE has been plotted for small data in
figure 112. All plots are for test dataset.

16

Figure 108: Model complexity x2 . Regulariza-Figure 109: Model complexity x5 . Regularization=0. MSE=0.245
tion=0. MSE=0.125

Figure 110: Model complexity x5 . Regulariza-Figure 111: Model complexity x6 . Regularization=0.5


tion=0. MSE=0.1105
Target Vs. Model output data plots for train data and validation data of Univatariate dataset are
shown in figures 113 and 114. Refer figures 115, 116 and 117 for scatter plots with Target output on
x-axis and Model output on y-axis.

17

Figure 112: Regularization

Figure 113: Train Data

Figure 114: Validation Data

Figure 115: Train Data

Figure 116: Validation Data

Figure 117: Test Data

Target Vs. Model output data plots for train data and validation data of Bivatariate dataset are
shown in figures 118 and 119. Refer figures 121, 122 and 123 for scatter plots with Target output on
x-axis and Model output on y-axis.

Figure 118: Train Data

Figure 119: Validation Data

Figure 120: Test Data

Figure 121: Train Data

Figure 122: Validation Data

Figure 123: Test Data

18

Linear regression using Gaussian Basis functions


Linear regression using GBF gave better results as number of clusters were increased (See figures 124, 125,
126 and 127). Overfitting was reduced for small size of data when regularization was used (See figures
128, 129, 130 and 131). Also when model complexity was not high enough, including a regularization
parameter degraded the results. All plots are for training dataset.

Figure 124: Model complexity K = 2. Regu-Figure 125: Model complexity K = 4. Regularization=0


larization=0

Figure 126: Model complexity K = 8. Regu-Figure 127: Model complexity K = 20. Regularization=0.5
larization=0.5

19

Figure 128: Model complexity K = 12. Regu-Figure 129: Model complexity K = 12. Regularization=0. MSE=206.92
larization=0.2. MSE=101.05

Figure 130: Model complexity K = 12. Regu-Figure 131: Model complexity K = 12. Regularization=0.5. MSE=106.80
larization=0.8. MSE=74.63
Target Vs. Model output data plots for test data and validation data of Bivariate dataset are shown
in figures 132 and 133. Refer figures 134, 135 and 136 for scatter plots with Target output on x-axis and
Model output on y-axis.

20

Figure 132: Test Data. MSE=39.83

Figure 134: Train Data

Figure 133: Validation Data

Figure 135: Validation Data

Figure 136: Test Data

Target Vs. Model output data plots for train data and validation data of Univatariate dataset are
shown in figures 137 and 138. Refer figures 140, 141 and 142 for scatter plots with Target output on
x-axis and Model output on y-axis.

Figure 137: Train Data

Figure 138: Validation Data

Figure 139: Test Data

Figure 140: Train Data

Figure 141: Validation Data

Figure 142: Test Data

21

MLFFNN with 1 hidden layer


The activation function of the output layer is identity. The error function used is MSE.
For data generation, 200 points were generated using the function sin2 (2x) attenuated by a normal
noise with zero mean and variance of 0.1.
This was split up in to training(0.75) and validation set(0.25).
The number of nodes in the hidden layer is 5. The error goal of the training is 0.1 and the algorithm
did not converge for 500 epochs to the goal for 3 and 4 neurons in the hidden layer.

Figure 143: Train Data Univariate function


Figure 144: Train Data Univariate scatter
output

MLFFNN with 2 hidden layers


The activation function of the output layer is identity. The error function used is MSE.
The experiments done were:
(2,2,1)
(3,3,1)
(4,4,1)
(5,5,1)
(8,8,1)
where the first two numbers correspond to the number of neurons in the hidden layer 1 and hidden layer
2 respectively. The last number is the number of outputs.
The accuracy did not improve much beyond (4,4,1). Hence the model was selected.
The Dataset2 had four training sets. The MSE with the above model by running on training sets of
different sizes is as follows:

Validation
Test
Train

Train20
205.57
217.0553
3.820

Train Data
Train100
79.134
85.058
41.035

Train1000
98.29
99.696
89.696

Train 2000
120.28
118.380
129.4516

Table 5: Comparison of MSE for different sizes of Train data


The plots shown below are for the training set with 1000 points.

22

Figure 146: Validation Data

Figure 145: Train Data

Figure 148: Train Data Bivariate

Figure 147: Test Data

Figure 149: Validation Data BiFigure 150: Test Data Bivariate


variate

Figure 151:
Epoch 1

Node 1 of layer 1Figure 152: Node 2 of layer 1Figure 153:


Epoch 1
Epoch 1

Figure 154:
Epoch 1

Node 4 of layer 1

23

Node 3 of layer 1

Figure 155:
Epoch 1

Node 1 of layer 2Figure 156: Node 2 of layer 2Figure 157:


Epoch 1
Epoch 1

Figure 158:
Epoch 1

Node 4 of layer 2Figure 159: Node 4 of Output


layer Epoch 1

Figure 160:
Epoch 2

Node 1 of layer 1Figure 161: Node 2 of layer 1Figure 162:


Epoch 2
Epoch 2

Figure 163:
Epoch 2

Node 4 of layer 1

24

Node 3 of layer 2

Node 3 of layer 1

Figure 164:
Epoch 2

Node 1 of layer 2Figure 165: Node 2 of layer 2Figure 166:


Epoch 2
Epoch 2

Figure 167:
Epoch 2

Node 4 of layer 2Figure 168: Node 4 of Output


layer Epoch 2

Figure 169:
Epoch 10

Node 1 of layer 1Figure 170: Node 2 of layer 1Figure 171:


Epoch 10
Epoch 10

Figure 172:
Epoch 10

Node 4 of layer 1

25

Node 3 of layer 2

Node 3 of layer 1

Figure 173:
Epoch 10

Node 1 of layer 2Figure 174: Node 2 of layer 2Figure 175:


Epoch 10
Epoch 10

Figure 176:
Epoch 10

Node 4 of layer 2Figure 177: Node 4 of Output


layer Epoch 10

Figure 178:
Epoch 50

Node 1 of layer 1Figure 179: Node 2 of layer 1Figure 180:


Epoch 50
Epoch 50

Figure 181:
Epoch 50

Node 4 of layer 1

26

Node 3 of layer 2

Node 3 of layer 1

Figure 182:
Epoch 50

Node 1 of layer 2Figure 183: Node 2 of layer 2Figure 184:


Epoch 50
Epoch 50

Figure 185:
Epoch 50

Node 4 of layer 2Figure 186: Node 4 of Output


layer Epoch 50

Figure 187:
Epoch 5000

Node 1 of layer 1Figure 188: Node 2 of layer 1Figure 189:


Epoch 500
Epoch 500

Figure 190:
Epoch 500

Node 4 of layer 1

27

Node 3 of layer 2

Node 3 of layer 1

Figure 191:
Epoch 500

Node 1 of layer 2Figure 192: Node 2 of layer 2Figure 193:


Epoch 500
Epoch 500

Figure 194:
Epoch 500

Node 4 of layer 2Figure 195: Node 4 of Output


layer Epoch 500

Node 3 of layer 2

Radial Basis Function Neural Network


It took more clusters than GBF to get good results for the same dataset (See figures 196, 197, 198 and
199). Over fitting reduced with regularization as in GBF (See figures 200, 201, 202 and 203).

28

Figure 196: Model complexity K = 4. Regu-Figure 197: Model complexity K = 8. Regularization=0


larization=0.5

Figure 198: Model complexity K = 12. Regu-Figure 199: Model complexity K = 20. Regularization=0.8
larization=0.5

29

Figure 200: Model complexity K = 15. Regu-Figure 201: Model complexity K = 15. Regularization=0. MSE=694.55
larization=0.2. MSE=68.87

Figure 202: Model complexity K = 15. Regu-Figure 203: Model complexity K = 15. Regularization=0.5. MSE=73.4
larization=0.8. MSE=84

30

Target Vs. Model output data plots for test data and validation data of Bivariate dataset are shown
in figures 204 and 205. Refer figures 206, 207 and 208 for scatter plots with Target output on x-axis and
Model output on y-axis.

Figure 204: Test Data. MSE=41.23

Figure 206: Train Data

Figure 205: Validation Data

Figure 207: Validation Data

Figure 208: Test Data

Target Vs. Model output data plots for train data and validation data of Univatariate dataset are
shown in figures 209 and 210. Refer figures 212, 213 and 214 for scatter plots with Target output on
x-axis and Model output on y-axis.

31

Figure 209: Train Data

Figure 210: Validation Data

Figure 211: Test Data

Figure 212: Train Data

Figure 213: Validation Data

Figure 214: Test Data

32

You might also like