KMPA Assignment-I: 1 Classification

Department of Computer Science & Engineering
Indian Institute of Technology-Madras
KMPA Assignment-I
Group No. 11
Vindhani Mohsin (ED11B041), Rohit R. Salunke (ME11B119), Pritesh Jain (ME11B116)
March 3, 2016
Classification
For all the classification task, the selected error function is Cross-Entropy error with the desired output
being 1 of K representation. The input is normalized by the algorithm before feeding in to the training.
The activation function of the last layer is Sigmoid to bound the outcome of the final layer in 0 to 1.
The model selection was done by running models with different hyper parameters subject to the same
learning algorithm with fixed parameters and fixed maximum number of epochs allowed.
Linearly Separable classes

The selection of the model was done by varying the number of neurons in the hidden layers. The criteria
for selection was the accuracy of the model on the validation set. The experiments done were:
(1,1,3)
(2,2,3)
(3,3,3)
where the first two numbers correspond to the number of neurons in the hidden layer 1 and hidden layer
2 respectively. The last number is the number of outputs.
The models (2,2,3) and (3,3,3) gave full accuracy but the first one being less complex is selected.
Refer table 1 and figure 1
Class 1
Class 2
Class 3
Accuracy
Confusion Matrix
Class 1
Class 2
100
0
0
100
0
0
100%
Class 3
0
0
100
Table 1: Linear 2,2,3
Figure 1: Linear 2,2,3

Selection of Training algorithm
Once the gradients of the weights are obtained, the selection of the algorithm for update is done by
testing it on this task and looking at the number of epochs. The algorithms tried are:
Gradient Descent
Conjugate Gradient Descent
Resilient Backpropogation(rprop)
Gradient Descent with adaptive rate
The best algorithm for training is rprop with the parameters:
initial update =0.7
max weight change allowed =50
learning rate=0.01
Refer figures 2, 3, 4 and 5.
Figure 2: CEE with CG
Figure 3: CEE with GD
Figure 5: CEE with RPROP(10,10,3)
Figure 4: CEE with GDX
Non Linear classes

The selection of the model was done by varying the number of neurons in the hidden layers. The criteria
for selection was the accuracy of the model on the validation set. The experiments done were:
(3,3,3)
(5,5,3)
(8,8,3)
(7,7,3)
As the last model gave full accuracy on validation and test set, it was selected.
A less complex model was tried to check if the output can be achieved by lesser complexity but by
doing experiments, it was noted that the model did not give full accuracy on all the training iterations.
Hence (8,8,3) was selected.
Class 1
Class 2
Class 3
Accuracy
Confusion Matrix
Class 1
Class 2
60
0
0
120
0
0
100%
Class 3
0
0
160
Table 2: Non-Linear 8,8,3
Figure 6: Non-Linear 8,8,3
Figure 7: Node 1 of layer 1 EpochFigure 8: Node 2 of layer 1 EpochFigure 9: Node 3 of layer 1 Epoch
1
1
1
Figure 10: Node 4 of layer 1 EpochFigure 11:

Epoch 1
1
Node 5 of layer 1Figure 12: Node 6 of layer 1 Epoch

1

Epoch 1
1
Node 8 of layer 1

Epoch 1
1

1

Epoch 1
1

1

Epoch 1
1
Node 8 of layer 2
Figure 23: Node 1 of output layerFigure 24: Node 2 of output layerFigure 25: Node 3 of output layer
Epoch 1
Epoch 1
Epoch 1

Epoch 2
2

2

Epoch 2
2

2

Epoch 2
2
Node 8 of layer 1

Epoch 2
2

2

Epoch 2
2

2

Epoch 2
2
Node 8 of layer 2
Epoch 2
Epoch 2
Epoch 2

Epoch 10
10

10

Epoch 10
10

10

Epoch 10
10
Node 8 of layer 1

Epoch 10
10

10

Epoch 10
10

10

Epoch 10
10
Node 8 of layer 2
Epoch 10
Epoch 10
Epoch 10

Epoch 50
50

50

Epoch 50
50

50

Epoch 50
50
Node 8 of layer 1
10

Epoch 50
50

50

Epoch 50
50

50

Epoch 50
50
Node 8 of layer 2
11
Epoch 50
Epoch 50
Epoch 50

Epoch 100
100

100

Epoch 100
100

100

Epoch 100
100
Node 8 of layer 1
12

Epoch 100
100

100

Epoch 100
100

100

Epoch 100
100
Node 8 of layer 2
Figure 99: Node 1 of output layerFigure 100: Node 2 of outputFigure 101: Node 3 of output layer
layer Epoch 100
Epoch 100
Epoch 100
13
Overlapping classes
It is not possible to obtain full accuracy and hence experiments were done to study the changes in the
accuracy and decision boundary by changing the model complexity.
The experiments done were:
(5,5,3)
(8,8,3)
(12,12,3)
On increasing the number of neurons, patches started coming up as shown in (12,12,3) and the
accuracy reduced. For the model (5,5,3) , the model could not achieve high accuracy as the complexity
was not enough so the model (8,8,3) is selected as the optimal.
Refer tables 3 and 4 and figures 102 and 103.
Class 1
Class 2
Class 3
Accuracy
Confusion
Class 1
131
7
4
Matrix
Class 2
14
140
8
90.8%
Class 3
5
3
138
Class 1
Class 2
Class 3
Accuracy
Table 3: Overlapping 8,8,3
Confusion
Class 1
135
21
6
Matrix
Class 2
11
125
6
88.4%
Class 3
4
4
138
Table 4: Overlapping 12,12,3
Figure 102: Overlapping 8,8,3
Figure 103: Overlapping 12,12,3
Image Data
Each data point has 48 features. The data was separated in to training(0.7), validation set(0.15) and
test set(0.15).
(10,10,5)
(15,15,5)
(20,20,5)
(25,25,5)
14
(30,30,5)
The accuracy of the model for (25,25,5) was better than (20,20,5) and (30,30,5). Hence it was selected.
Figure 104: Neural Network for Image Data
Figure 105: Classification for image data
Figure 106: Validation graphs
15
Figure 107: Confusion Matrix
Regression
Polynomial curve fitting

Polynomial curve fitting for dataset 1 gave optimal results on x4 complexity after which we noticed some
overfitting (Refer figures 108, 109 and 111). Overfitting could be seen on either edges of the plot 111.
Use of regularization parameter did not have significant effect on the results since the data had 1000
points(Refer figures 109 and 110). Although regularization VS MSE has been plotted for small data in
figure 112. All plots are for test dataset.
16
Figure 108: Model complexity x2 . Regulariza-Figure 109: Model complexity x5 . Regularization=0. MSE=0.245
tion=0. MSE=0.125
Figure 110: Model complexity x5 . Regulariza-Figure 111: Model complexity x6 . Regularization=0.5

tion=0. MSE=0.1105
Target Vs. Model output data plots for train data and validation data of Univatariate dataset are
shown in figures 113 and 114. Refer figures 115, 116 and 117 for scatter plots with Target output on
x-axis and Model output on y-axis.
17
Figure 112: Regularization
Figure 113: Train Data
Figure 114: Validation Data
Figure 117: Test Data
Target Vs. Model output data plots for train data and validation data of Bivatariate dataset are
18
Linear regression using Gaussian Basis functions

Linear regression using GBF gave better results as number of clusters were increased (See figures 124, 125,
126 and 127). Overfitting was reduced for small size of data when regularization was used (See figures
128, 129, 130 and 131). Also when model complexity was not high enough, including a regularization
parameter degraded the results. All plots are for training dataset.
Figure 124: Model complexity K = 2. Regu-Figure 125: Model complexity K = 4. Regularization=0

larization=0
Figure 126: Model complexity K = 8. Regu-Figure 127: Model complexity K = 20. Regularization=0.5
larization=0.5
19
Figure 128: Model complexity K = 12. Regu-Figure 129: Model complexity K = 12. Regularization=0. MSE=206.92
larization=0.2. MSE=101.05
Figure 130: Model complexity K = 12. Regu-Figure 131: Model complexity K = 12. Regularization=0.5. MSE=106.80
Target Vs. Model output data plots for test data and validation data of Bivariate dataset are shown
in figures 132 and 133. Refer figures 134, 135 and 136 for scatter plots with Target output on x-axis and
Model output on y-axis.
20
Figure 132: Test Data. MSE=39.83
21
MLFFNN with 1 hidden layer

The activation function of the output layer is identity. The error function used is MSE.
For data generation, 200 points were generated using the function sin2 (2x) attenuated by a normal
noise with zero mean and variance of 0.1.
This was split up in to training(0.75) and validation set(0.25).
The number of nodes in the hidden layer is 5. The error goal of the training is 0.1 and the algorithm
did not converge for 500 epochs to the goal for 3 and 4 neurons in the hidden layer.
Figure 143: Train Data Univariate function

Figure 144: Train Data Univariate scatter
output
MLFFNN with 2 hidden layers

The activation function of the output layer is identity. The error function used is MSE.
(2,2,1)
(3,3,1)
(4,4,1)
(5,5,1)
(8,8,1)
The accuracy did not improve much beyond (4,4,1). Hence the model was selected.
The Dataset2 had four training sets. The MSE with the above model by running on training sets of
different sizes is as follows:
Validation
Test
Train
Train20
205.57
217.0553
3.820
Train Data
Train100
79.134
85.058
41.035
Train1000
98.29
99.696
89.696
Train 2000
120.28
118.380
129.4516
Table 5: Comparison of MSE for different sizes of Train data

The plots shown below are for the training set with 1000 points.
22
Figure 148: Train Data Bivariate
Figure 149: Validation Data BiFigure 150: Test Data Bivariate

variate
Figure 151:
Epoch 1
Node 1 of layer 1Figure 152: Node 2 of layer 1Figure 153:

Epoch 1
Epoch 1
Figure 154:
Epoch 1
Node 4 of layer 1
23
Node 3 of layer 1
Figure 155:
Epoch 1

Epoch 1
Epoch 1
Figure 158:
Epoch 1
Node 4 of layer 2Figure 159: Node 4 of Output

layer Epoch 1
Figure 160:
Epoch 2

Epoch 2
Epoch 2
Figure 163:
Epoch 2
Node 4 of layer 1
24
Node 3 of layer 2
Node 3 of layer 1
Figure 164:
Epoch 2

Epoch 2
Epoch 2
Figure 167:
Epoch 2

layer Epoch 2
Figure 169:
Epoch 10

Epoch 10
Epoch 10
Figure 172:
Epoch 10
Node 4 of layer 1
25
Node 3 of layer 2
Node 3 of layer 1
Figure 173:
Epoch 10

Epoch 10
Epoch 10
Figure 176:
Epoch 10

layer Epoch 10
Figure 178:
Epoch 50

Epoch 50
Epoch 50
Figure 181:
Epoch 50
Node 4 of layer 1
26
Node 3 of layer 2
Node 3 of layer 1
Figure 182:
Epoch 50

Epoch 50
Epoch 50
Figure 185:
Epoch 50

layer Epoch 50
Figure 187:
Epoch 5000

Epoch 500
Epoch 500
Figure 190:
Epoch 500
Node 4 of layer 1
27
Node 3 of layer 2
Node 3 of layer 1
Figure 191:
Epoch 500

Epoch 500
Epoch 500
Figure 194:
Epoch 500

layer Epoch 500
Node 3 of layer 2
Radial Basis Function Neural Network

It took more clusters than GBF to get good results for the same dataset (See figures 196, 197, 198 and
199). Over fitting reduced with regularization as in GBF (See figures 200, 201, 202 and 203).
28
Figure 196: Model complexity K = 4. Regu-Figure 197: Model complexity K = 8. Regularization=0

larization=0.5
Figure 198: Model complexity K = 12. Regu-Figure 199: Model complexity K = 20. Regularization=0.8
larization=0.5
29
Figure 200: Model complexity K = 15. Regu-Figure 201: Model complexity K = 15. Regularization=0. MSE=694.55
Figure 202: Model complexity K = 15. Regu-Figure 203: Model complexity K = 15. Regularization=0.5. MSE=73.4
larization=0.8. MSE=84
30
Target Vs. Model output data plots for test data and validation data of Bivariate dataset are shown
in figures 204 and 205. Refer figures 206, 207 and 208 for scatter plots with Target output on x-axis and
Model output on y-axis.
Figure 204: Test Data. MSE=41.23
31
32

KMPA Assignment-I: 1 Classification

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

KMPA Assignment-I: 1 Classification

Uploaded by

Copyright:

Available Formats

Department of Computer Science & Engineering

Indian Institute of Technology-Madras

Linearly Separable classes

Table 1: Linear 2,2,3

Figure 1: Linear 2,2,3

Figure 2: CEE with CG

Figure 3: CEE with GD

Figure 5: CEE with RPROP(10,10,3)

Figure 4: CEE with GDX

Non Linear classes

Table 2: Non-Linear 8,8,3

Figure 6: Non-Linear 8,8,3

Figure 10: Node 4 of layer 1 EpochFigure 11:

Node 5 of layer 1Figure 12: Node 6 of layer 1 Epoch

Figure 13: Node 7 of layer 1 EpochFigure 14:

Figure 15: Node 1 of layer 2 EpochFigure 16:

Node 2 of layer 2Figure 17: Node 3 of layer 2 Epoch

Figure 18: Node 4 of layer 2 EpochFigure 19:

Node 5 of layer 2Figure 20: Node 6 of layer 2 Epoch

Figure 21: Node 7 of layer 2 EpochFigure 22:

Figure 26: Node 1 of layer 1 EpochFigure 27:

Node 2 of layer 1Figure 28: Node 3 of layer 1 Epoch

Figure 29: Node 4 of layer 1 EpochFigure 30:

Node 5 of layer 1Figure 31: Node 6 of layer 1 Epoch

Figure 32: Node 7 of layer 1 EpochFigure 33:

Figure 34: Node 1 of layer 2 EpochFigure 35:

Node 2 of layer 2Figure 36: Node 3 of layer 2 Epoch

Figure 37: Node 4 of layer 2 EpochFigure 38:

Node 5 of layer 2Figure 39: Node 6 of layer 2 Epoch

Figure 40: Node 7 of layer 2 EpochFigure 41:

Figure 45: Node 1 of layer 1 EpochFigure 46:

Node 2 of layer 1Figure 47: Node 3 of layer 1 Epoch

Figure 48: Node 4 of layer 1 EpochFigure 49:

Node 5 of layer 1Figure 50: Node 6 of layer 1 Epoch

Figure 51: Node 7 of layer 1 EpochFigure 52:

Figure 53: Node 1 of layer 2 EpochFigure 54:

Node 2 of layer 2Figure 55: Node 3 of layer 2 Epoch

Figure 56: Node 4 of layer 2 EpochFigure 57:

Node 5 of layer 2Figure 58: Node 6 of layer 2 Epoch

Figure 59: Node 7 of layer 2 EpochFigure 60:

Figure 64: Node 1 of layer 1 EpochFigure 65:

Node 2 of layer 1Figure 66: Node 3 of layer 1 Epoch

Figure 67: Node 4 of layer 1 EpochFigure 68:

Node 5 of layer 1Figure 69: Node 6 of layer 1 Epoch

Figure 70: Node 7 of layer 1 EpochFigure 71:

Figure 72: Node 1 of layer 2 EpochFigure 73:

Node 2 of layer 2Figure 74: Node 3 of layer 2 Epoch

Figure 75: Node 4 of layer 2 EpochFigure 76:

Node 5 of layer 2Figure 77: Node 6 of layer 2 Epoch

Figure 78: Node 7 of layer 2 EpochFigure 79:

Figure 83: Node 1 of layer 1 EpochFigure 84:

Node 2 of layer 1Figure 85: Node 3 of layer 1 Epoch

Figure 86: Node 4 of layer 1 EpochFigure 87:

Node 5 of layer 1Figure 88: Node 6 of layer 1 Epoch

Figure 89: Node 7 of layer 1 EpochFigure 90:

Figure 91: Node 1 of layer 2 EpochFigure 92:

Node 2 of layer 2Figure 93: Node 3 of layer 2 Epoch

Figure 94: Node 4 of layer 2 EpochFigure 95:

Node 5 of layer 2Figure 96: Node 6 of layer 2 Epoch

Figure 97: Node 7 of layer 2 EpochFigure 98:

Table 3: Overlapping 8,8,3

Table 4: Overlapping 12,12,3