Professional Documents
Culture Documents
Group Project
Student Details
Abhinav Premsekhar – H20063
Nikhil Kumar Upadhyaya – H20097
Tushar Rawat – H20118
Contents
1 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Business Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3 Data Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
4 Description of Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
5 Data Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
6 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
7 Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
8 Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
9 Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
a. Appendix – Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
1 Objective
3
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
2 Business Objective
Wine is an alcoholic beverage made from grapes, generally fermented without the
addition of sugars, acids, enzymes, water, or other nutrients. Yeast consumes the sugar
in the grapes and converts it to ethanol and carbon dioxide. Different varieties of
grapes and strains of yeasts produce different styles of wine. These variations result
from the complex interactions between the biochemical development of the grape, the
reactions involved in fermentation, the terroir, and the production process. The
Wines we are going to study here is Red and White Wine. The red-wine production
process involves extraction of color and flavor components from the grape skin. Red
wine is made from dark-colored grape varieties. When making white wine, the grape
skins are removed before fermentation, resulting in a clear juice that ultimately yields
a transparent white wine.
The ultimate aim of analysis and outcome would be to recognise the exact factors and
their dependence, that would give a satisfactory quality of wine. Whether factors be
physiochemical like alcohol %, chlorides, sulphates content etc or pH, density.
Other factors that exactly determine how good the wine as the final outcome of our
model. The model will be trained and when provided with all the relevant metric,
would predict the quality of wine.
4
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
3 Data Understanding
The data consists of information of almost 6500 data entries for various red and white
wine. The data set consists of:
5
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
6
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
4 Description of Parameters
The data has been distributed across various parameters, as discussed below:
Fixed Acidity:
7
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
Volatile Acidity:
8
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
Citric Acid:
9
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
Residual Sugar:
10
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
Chlorides:
11
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
Sulfur Dioxide:
12
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
Density:
13
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
pH:
14
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
Sulphates:
15
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
Alcohol %:
16
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
Quality:
17
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
18
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
5 Data Preparation
• Wine Type and wine selected are the variables which were converted into
factor variables.
• The Column “id” was removed as it is irrelevant to our model.
• The data set had no empty/null values, so imputation is not needed.
• One of the data points having a residual sugar of 65.8 which was an outlier
was removed.
• All outliers were removed using the formula Q1- 1.5 (IQR) and Q3+ 1.5(IQR)
Next, we check if the data has a class imbalance problem. The cleaned data has been
split into training and test data in the ratio 70% and 30% respectively with seed set
as 11111.
19
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
6 Logistic Regression
20
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
21
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
7 Decision Tree
22
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
23
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
8 Random Forest
24
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
9 Neural Network
25
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
10 Conclusion
By looking into the details, we can see that good quality wines have higher
levels of alcohol on average, have a lower volatile acidity on average, higher
levels of sulphates on average, and higher levels of residual sugar on average.
Free sulfur dioxide, pH and residual sugar are the least important criteria for
determining quantity, while alcohol type, volatile acidity and sulphates are the
most influential factors.
26
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
a. Appendix – Code
summary(Wine_Data)
dp = Wine_Data$volatile.acidity
q1 = quantile(dp,0.25)
q3 = quantile(dp,0.75)
iqr = q3 -q1
Wine_Data[volatile.acidity>=1.5*iqr + q3]
str(Wine_Data)
table(Train_Data$wine.selected)
###Train-Test Split
set.seed(11111)
Split_Type = sample.split(Wine_Data$wine.selected,SplitRatio = 0.7)
Train_Data = subset(Wine_Data[,-c("ï..id")], Split_Type == TRUE)
Test_Data = subset(Wine_Data[,-c("ï..id")], Split_Type == FALSE)
#Decision Tree
Model_DT = rpart(wine.selected~., data = Train_Data,method = "class")
DT_Prediction = predict(Model_DT, Test_Data, type = "class")
DT_Confusion_Matrix = table(Test_Data$wine.selected, DT_Prediction)
DT_Accuracy = sum(diag(DT_Confusion_Matrix))/sum(DT_Confusion_Matrix)
DT_Accuracy
library(ROCR)
probrf=predict(Model_DT,Test_Data,type="prob")
head(probrf)
probrf1<-probrf[,2]
predrf<-prediction(probrf1,Test_Data$wine.selected)
rocrf=performance(predrf,"tpr","fpr")
plot(rocrf)
#Random forest
Model_RF = randomForest(wine.selected~., data = Train_Data)
RF_Prediction = predict(Model_RF, Test_Data, type = "class")
RF_Prediction = ifelse(RF_Prediction<0.5,0,1)
RF_Confusion_Matrix = table(Test_Data$wine.selected, RF_Prediction)
RF_Accuracy = sum(diag(RF_Confusion_Matrix))/sum(RF_Confusion_Matrix)
27
MLBABJ20-4 Group Project A Study of Red & White Wines: An Objective Critique of Quality
RF_Accuracy
summary(Model_RF)
#Neural Network
NN_Data = model.matrix(~.,Wine_Data[,-c("ï..id")])
NN_Data = as.data.table(NN_Data)
NN_Data$`(Intercept)` = NULL
names(NN_Data) <- make.names(names(NN_Data))
set.seed(11111)
Split_Type = sample.split(NN_Data$wine.selected,SplitRatio = 0.7)
NN_Train_Data = subset(NN_Data, Split_Type == TRUE)
NN_Test_Data = subset(NN_Data, Split_Type == FALSE)
Model_NN =
neuralnet(wine.selected~.,data=NN_Train_Data,hidden=0,threshold=0.2,linear.
output=TRUE)
NN_Prediction = predict(Model_NN, NN_Test_Data[,!c("wine.selected"),with =
F])
NN_Prediction = ifelse(NN_Prediction<0.5,0,1)
NN_Confusion_Matrix = table(Test_Data$wine.selected, NN_Prediction)
NN_Accuracy = sum(diag(NN_Confusion_Matrix))/sum(NN_Confusion_Matrix)
NN_Accuracy
28