You are on page 1of 7

e-ISSN: 2582-5208

International Research Journal of Modernization in Engineering Technology and Science


( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com

COMPARATIVE ANALYSIS OF CLASSIFICATION ALGORITHM FOR


PREDICTING WINE QUALITY USING MACHINE LEARNING
Parvathy V A*1, Jismy Joseph*2
*1Department Of MCA, SCMS School Of Technology And Management, Ernakulam, Kerala, India.
*2Associate Professor, Department Of MCA, SCMS School Of Technology And Management,
Ernakulam, Kerala, India.
ABSTRACT
Wine is one of the foremost commonly used beverages. So, the quality of the wine is always important for its
consumers, and mainly for producers within the present competitive market to boost the revenue. Most of the
consumers judge the quality of the wine based on the certified obtained for the wine. Everyone has their own
opinion about the taste, so identifying the top quality based on a person’s taste is challenging. Wine ranges from
red, white, rose, sparkling, and dessert wines. In this paper, the samples of both red and white wines with their
attributes required for quality assurance are collected and different machine learning classification algorithms
are applied to them. The results are based on comparing various factors such as accuracy, precision, recall, f1-
score and confusion matrix.
Keywords: Machine Learning, Accuracy, Precision, Confusion Matrix.
I. INTRODUCTION
In recent years there's been a modest increase in wine consumption because it has been found that wine
consumption encompasses a correlation to rate variability. With the rise in consumption wine industries are
trying to find alternatives to provide good quality wine at less cost. Different wines have different purposes.
Although most of the chemicals are identical for various wines based on the chemical tests, the quantity of each
chemical has different levels of concentration for various forms of wines. Nowadays it's really important to
classify different wines for quality assurance. In the past, due to the absence of technological resources, it
becomes difficult for many industries to classify wines based on qualitative analysis because it takes a lot of
time and also needs extra money. Nowadays with the arrival of machine learning techniques, it's possible to
classify the wines because it is possible to figure out the importance of every qualitative analysis parameter
within the wine and which one to ignore for reduction of cost.
The two datasets employed in this project contain two different characteristics which are Physio-chemical and
sensorial of two different wines (red and white), the merchandise is termed "Vinho Verde". Vinho Verde is a
Portuguese wine from the northwest region of Portugal Minho. Medium in alcohol is especially appreciated
thanks to its freshness particularly within the summer. The datasets are collected from the UCI Machine
Learning Repository were 1599 samples for red wine and 4898 samples for white wine. The data can be used to
test (ordinal) regression or classification (in effect, this is often a multi-class task, where the classes are
ordered) methods.
Machine learning is the flexibility of the system to automatically learn and improve from experience without
being programmed. It focuses on the development of programs that will access the data and learn by
themselves. Here in this project classification algorithms which is a supervised learning technique that is used
to identify the recent observations on the basis of training data and testing data. In the Classification algorithm,
a program learns from the given datasets or observations and then classifies the new observations into various
classes or groups.
II. METHODOLOGY
The project is based on wine quality prediction based on machine learning. So, we use different classification
algorithms for predicting the quality of red and white wine. Then it uses various factors for finding the accuracy
to find the best algorithm among the five algorithms. For predicting the wine quality, it goes through the
following stages: -

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[154]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com
1) Classification algorithms
a. Naïve Bayes
b. Stochastic Gradient Descent (SGD)
c. K-Nearest Neighbors (KNN)
d. Random Forest
e. Support Vector Machine
2) Calculation
a. Confusion Matrix
b. Accuracy
c. F1-Score
d. Precision
e. Recall
III. MODELING AND ANALYSIS
The classification model is a supervised technique. It refers to a predictive modeling problem where the class
label is predicted for a given set of input data. The performance of a model is primarily dependent on the nature
of the dataset used. Supervised classification requires training data and human interventions whereas in
unsupervised classification human intervention isn't required because it's full computer operated.
First, we collect and read the necessary datasets. here we used two datasets one with red preference samples
and the other with white preference samples. Vinho Verde is a Portuguese wine from the northwest region of
Portugal Minho. Then data pre-processing is done on the data set to make it balanced. Afterward, the dataset is
been divided into two parts train dataset and test dataset. Then feature selection is been applied to the
proposed models: Naïve Bayes, Stochastic Gradient Descent (SGD), K-Nearest Neighbors (KNN), Random
Forest, and Support Vector Machine (SVM). Then accuracy has been calculated to grasp the efficiency for
different algorithms. It has been evaluated on the basis of precision, recall, and f1-score. Then ultimately, we
retrieve the most effective algorithm based on efficiency for the given datasets.
IV. RESULTS AND DISCUSSION
The Performance of the proposed models are in terms of confusion matrix, accuracy, precision, recall, and f1-
score.

Figure 1: Confusion Matrix of Naïve Bayes a) Red Wine b) White Wine


Table 1: Accuracy of Naïve Bayes
Red Wine Precision Recall F1-Score
Bad quality 0.95 0.87 0.91
Good quality 0.50 0.74 0.60
Accuracy 0.85

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[155]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com

White Wine
Bad quality 0.89 0.72 0.80
Good quality 0.43 0.70 0.54
Accuracy 0.72
The above Figure 1 shows the confusion matrix of Naïve Bayes for red wine and white wine. The model has
been evaluated on the basis of precision, recall, and f1-score. Table 1 shows the accuracy of Naïve Bayes on the
basis of above-mentioned parameters.

Figure 2: Confusion Matrix of Stochastic Gradient Descent a) Red Wine b) White Wine
Table 2: Accuracy of Stochastic Gradient Descent
Red Wine Precision Recall F1-Score
Bad quality 0.87 0.96 0.91
Good quality 0.37 0.15 0.21
Accuracy 0.84
White Wine
Bad quality 0.80 0.97 0.87
Good quality 0.63 0.18 0.28
Accuracy 0.78
The above Figure 2 shows the confusion matrix of Stochastic Gradient Descent for red wine and white wine.
The model has been evaluated on the basis of precision, recall, and f1-score. Table 2 shows the accuracy of
Stochastic Gradient Descent on the basis of above-mentioned parameters.

Figure 3: Confusion Matrix of K-Nearest Neighbors a) Red Wine b) White Wine

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[156]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com
Table 3: Accuracy of K-Nearest Neighbors
Red Wine Precision Recall F1-Score
Bad quality 0.87 0.96 0.91
Good quality 0.37 0.15 0.21
Accuracy 0.84
White Wine
Bad quality 0.80 0.97 0.87
Good quality 0.63 0.18 0.28
Accuracy 0.78
The above Figure 3 shows the confusion matrix of Stochastic Gradient Descent for red wine and white wine.
The model has been evaluated on the basis of precision, recall, and f1-score. Table 3 shows the accuracy of
Stochastic Gradient Descent on the basis of above-mentioned parameters.

Figure 4: Confusion Matrix of Random Forest a) Red Wine b) White Wine


Table 4: Accuracy of Random Forest
Red Wine Precision Recall F1-Score
Bad quality 0.91 0.97 0.94
Good quality 0.71 0.43 0.53
Accuracy 0.89
White Wine
Bad quality 0.89 0.96 0.92
Good quality 0.82 0.60 0.70
Accuracy 0.88
The above Figure 4 shows the confusion matrix of Stochastic Gradient Descent for red wine and white wine.
The model has been evaluated on the basis of precision, recall, and f1-score. Table 4 shows the accuracy of
Stochastic Gradient Descent on the basis of above-mentioned parameters.

Figure 5: Confusion Matrix of Support Vector Machine a) Red Wine b) White Wine
www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science
[157]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com
Table 5: Accuracy of Support Vector Machine
Red Wine Precision Recall F1-Score
Bad quality 0.88 0.98 0.93
Good quality 0.71 0.26 0.37
Accuracy 0.88
White Wine
Bad quality 0.83 0.97 0.89
Good quality 0.75 0.34 0.47
Accuracy 0.82
The above Figure 5 shows the confusion matrix of Stochastic Gradient Descent for red wine and white wine.
The model has been evaluated on the basis of precision, recall, and f1-score. Table 5 shows the accuracy of
Stochastic Gradient Descent on the basis of above-mentioned parameters.
Table 6: Summarized result of the five Machine Learning models of Red Wine
Models Accuracy Precision Recall F1-Score
Naïve Bayes 0.85 0.50 0.74 0.60
SGD 0.84 0.37 0.15 0.21
KNN 0.88 0.62 0.38 0.47
Random Forest 0.89 0.71 0.43 0.53
SVM 0.88 0.71 0.26 0.37
Table 7: Summarized result of the five Machine Learning models of White Wine
Models Accuracy Precision Recall F1-Score
Naïve Bayes 0.72 0.43 0.70 0.54
SGD 0.78 0.63 0.18 0.28
KNN 0.82 0.65 0.52 0.58
Random Forest 0.88 0.82 0.60 0.70
SVM 0.82 0.75 0.34 0.47
The objective of this project was to analyze and compare the various algorithm in terms of their accuracy,
precision, recall, and f1-score. The expected results of this project are to find a clear cut line of various
classification algorithms and which is more accurate.

Red Wine Comparsion

0.9 0.89
0.88 0.88
0.88
0.86 0.85
0.84
0.84
0.82
0.8
NAÏVE BAYES SGD KNN RANDOM SVM
FOREST

Figure 6: Comparison of all the five Machine Learning models for Red Wine

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[158]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com
From these, we can conclude that the most effective model for predicting red wine is Random Forest because by
comparing the algorithm based on the parameters its accuracy is 89 % and the second best is KNN and SVM
with an accuracy of 88 %.SGD is the least accurate.

Figure 7: Comparison of all the five Machine Learning models for White Wine
From these, we can conclude that the most effective model for predicting white wine is Random Forest because
by comparing the algorithm based on the parameters its accuracy is 88 % and the second best is KNN and SVM
with an accuracy of 82 %. Naive Bayes is the least accurate.
V. CONCLUSION
In this study, I tried to predict the wine quality using five Classification Algorithms: Naïve Bayes, Stochastic
Gradient Descent (SGD), K-Nearest Neighbors (KNN), Random Forest, and Support Vector Machine (SVM). I
have pre-processed the data according to the requirement. Then applied the above five mentioned models for
finding the accuracy. The results obtain concludes that Random Forest is the best model to predict the wine
quality with an accuracy of 0.89 for red wine and an accuracy of 0.88 for white wine.
VI. REFERENCES
[1] Satyabrata Aich and Ahmed Abdulhakim Al-Absi,“Prediction of Quality for Different Type of Wine based
on Different Feature Sets Using Supervised Machine Learning Techniques”, ICACT Transactions on
Advanced Communications Technology (TACT) Vol. 7, Issue 3, May 2018.
[2] Eve Thullen and Tanhoo Kim in “White Wine Quality Prediction: A Big Data Case Study Based on MS
Azure “, June 2018.
[3] Fengjiao Fan and Jianping Li in “Mathematical Model Application Based on Statistics in the Evaluation
Analysis of Grape Wine Quality”,2015 IEEE.
[4] Gongzhu Hu and Tan Xi in “Classification of Wine Quality with Imbalanced Data”,2016 IEEE.
[5] Sunny Kumar and Kanika Agarwal in “Red Wine Quality Prediction using Machine Learning Techniques
“, 2020 International Conference on Computer Communication and Informatics (ICCCI -2020), Jan. 22-
24, 2020, Coimbatore, INDIA.
[6] Sinisa Ilic and Stefan Pitulic in “A Data Mining Approach to Wine Quality Prediction”2019.
[7] Shruthi P in “Wine Quality Prediction using Data Mining”,2019 IEEE.
[8] Zhou Tingwei in “Red Wine Quality prediction through Active Learning”,2021.
[9] Akanksha Trivedi in “Wine Quality Detection through Machine Learning Algorithms”, 2018 IEEE.
[10] Gonzalo Astray and Juan Carlos Mejuto in “Prediction Models to Control Aging Time in Red Wine”, Feb
2019.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[159]
e-ISSN: 2582-5208
International Research Journal of Modernization in Engineering Technology and Science
( Peer-Reviewed, Open Access, Fully Refereed International Journal )
Volume:03/Issue:09/September-2021 Impact Factor- 6.752 www.irjmets.com
[11] Riccardo Croce and Cristina Malegori in “Prediction of Quality parameters in straw wine by means of
FT-IR Spectroscopy combined with Multivariate Data Processing”, Feb 2020.
[12] Yogesh Gupta in “Selection of Important Features and Predicting Wine Quality using Machine Learning
Techniques “, 2017.
[13] Xinpeng Ma and Jiafeng Pang in “Rapid Prediction of Multiple Wine Quality Parameters using Inffrared
Spectroscopy Coupling with Chemometric Methods”, 2020.
[14] Ashish Kumar and Roheet Bhatnagar in “Prediction of Wine Quality and Protien Synthesis using
ARSkNN”, 2020.

www.irjmets.com @International Research Journal of Modernization in Engineering, Technology and Science


[160]

You might also like