Professional Documents
Culture Documents
Learning Outcomes: (Student to write briefly about learnings obtained from the academic tasks)
Declaration:
I declare that this Assignment is my individual work. I have not copied it from any other student’s work
or from any other source except where due acknowledgement is made explicitly in the text, nor has any
part been written for me by any other person.
Student’s Signature:
library(e1071)
library(caTools)
library(caret)
set.seed(9999)
classifier_cl <- naiveBayes(glasstype ~ ., data = train_cl)
classifier_cl
confusionMatrix(cm)
Naïve Bayes - Output
Interpretation:
In Naive Bayes model, the data is split into 2 categories: training set and testing set. Naive
Bayes model creates condtional probability for the other variables such as refractive index,
amount of sodium, mangnesium and the other elements, which in this case is given by
classifier_cl. These conditional probabilities are used to make the confusion matrix.
According to the output table, the predictions regarding glass type 1 is correct for 5 predicted
values. Whereas, 4 values have been wrongly predicted to glass type 2 and 1 value has been
predicted for glass type 3. Similarly, there is 1 wrong value for glass type 1 and 2 correct
values for glass type 2. Similarly, 1, 0, 1, 0 correct predictions for glass types 3, 5, 6, 7. On
the other hand, the incorrect predictions for the same are 1, 0, 6, 0 respectively. This leaves
the output at 40.9% accuracy.
Decision Tree – Rscript
library(ISLR)
library(rpart)
library(rpart.plot)
str(sr)
Interpretation:
Decision tree classifies the glass types based on the properties like refractive index and
elements in glass. The first classification has been done on the basis of the amount of Barium.
The glass types that have amounts of Barium more than 0.335% are classified into glass type
7, however, within glass type 7, there are 3 glass types that have been classified into glass
types 1, 2 and 5 each. The glass data which have Further, classification occurs on the basis of
Aluminium. However, the classification of aluminium further requires the classification of
glass types on the basis of Calcium and Magnesium.
In case of classification on the basis of magnesium, if the magnesium content is more than or
equal to 2.26%, then it is classified as glass type 2. But here, there are 6, 4 and 1 glass types
1, 2 and 6 present, which is incorrectly classified. If the magnesium content is less than
2.26%, then there is a further classification with respect to the sodium content. If it is more
than 13.4%, then it is considered as glass type 6, which has 3 wrong entries. Glass type 5 has
11 entries which have sodium content less that 13.4%, and there is 1 incorrect entry here,
which belongs to glass type 2.
When classifying on the basis of Calcium content, the glass data which has calcium content
more than 10.4% have been categorised into glass type 2. There are 2 categories of glass type
2, because they have a difference in aluminium content. If the calcium content is less than
10.4%, then further classification is done on the basis of refractive index. If the refractive
index is less than 1.5%, then it is classified as glass type 3, where there are 3 incorrect
classification of glass type 1, 3 of glass type 2, and 1 of glass types 6 and 7 each. And finally,
the glass data that has refractive index more than or equal 1.5%, is further classified on the
basis of Magnesium once again, where glass type 1 has less magnesium content than 3.8%,
and glass type 3, having magnesium content more than 3.8%.
Kmeans – Rscript
library(factoextra)
library(ggplot2)
Glasses = Glass_Types %>% select(RI,Na,Mg,Al,Si,K,Ca,Ba,Fe)
Glasses
str(Glasses)
fviz_nbclust(x = Glasses, kmeans, method = "wss")
set.seed(123)
km.res = kmeans(Glasses,6,nstart = 25)
km.res
fviz_cluster(km.res,data = Glasses)
km.res$cluster
clusterdata = cbind(Glasses,cluster = km.res$cluster)
library(dplyr)
a=clusterdata %>% group_by(RI) %>%
summarise(mean(Na),mean(Mg),mean(Al),mean(Si),mean(K),mean(Ca),mean(Ba),mean(Fe)
)
a
Kmeans - Output
Interpretation:
According to the WSS plot, the optimum number of cluster would be where the graph line
starts to get flat. In this case, it is 6. Hence, in the rscript k has been considered as 6. There
are 214 observations in the data set, which have been classified into 6 clusters as shown in the
image just above. However, there is some amount of inaccuracy as these clusters are
somewhat overlapping. The major overlapping is seen among clusters 1 and 3, followed by
the overlapping between clusters 2 and 5 and 3 and 4. According to the table presented by
Km.res$centres, the mean value for refractive index, sodium, magnesium and many more are
1.52, 13.89, 3.34 and so on for cluster 1. Similar, is the case with the other cluster. Since
these values of centres are so closely related, hence, the clusters overlap. These values of
centres are the classifying criteria for all the observations that are included within those
clusters.
Hierarchical Unsupervised – Rscript
0# Agglomerative coefficient
hc2$ac
hc3 = agnes(data1, method = "ward")
pltree(hc3, cex = 0.6, hang = -1, main = "Dendrogram of agnes")
hc4 = diana(data1)
hc4$dc
pltree(hc4, cex = 0.6, hang = -1, main = "Dendrogram of diana")
hc5 = hclust(d, method = "ward.D2" )
sub_grp = cutree(hc5, k = 6)
table(sub_grp)
Glasses1 %>%
mutate(cluster = sub_grp) %>%
head
plot(hc5, cex = 0.6)
rect.hclust(hc5, k = 6, border = 2:5)
fviz_cluster(list(data = data1, cluster = sub_grp))
hc_a = agnes(data1, method = "ward")
cutree(as.hclust(hc_a), k = 6)
hc_d <- diana(data1)
cutree(as.hclust(hc_d), k = 6)
res.dist = dist(data1, method = "euclidean")
hc1 = hclust(res.dist, method = "complete")
hc2 = hclust(res.dist, method = "ward.D2")
dend1 = as.dendrogram (hc1)
dend2 = as.dendrogram (hc2)
tanglegram(dend1, dend2)
Hierarchical Unsupervised – Output
Interpretation:
Hierarchical clustering classifies the 214 glass observations into clusters using dendrograms.
The first dendrogram is created on the basis of Euclidean distance, of each observation from
another, and they have been categorised in clusters. Then we have another dendrogram that is
named “Dendrogram of Agnes”, and they both categorize the observations into several
clusters. There is another dendrogram named “Dendrogram of Diana”. Combining these
dendrograms together, we have a cluster dendrogram which actually shows the clusters,
marked by coloured lines. Further, these clusters have been represented in a plot.
At the very end of the program, we see a tanglegram which shows the connection of different
uniform groups from one another, from the different dendrograms.