33% found this document useful (3 votes)

3K views12 pages

Wine Quality Analysis Project Report

its a machine learning project report, it has classification algorithms like knn, naive bayes and random forest

Uploaded by

Karishma Kurickal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

33% found this document useful (3 votes)

3K views12 pages

Wine Quality Analysis Project Report

its a machine learning project report, it has classification algorithms like knn, naive bayes and random forest

Uploaded by

Karishma Kurickal

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

Introduction
Implementation
Conclusion
Results and Discussion
References

School of Engineering and Technology

Department of Computer Science and Engineering

Jain Global Campus, Kanakapura Taluk - 562112
Ramanagara District, Karnataka, India

A Project Report on
“Wine Quality Analysis”

For the partial fulfilment of

BACHELOR OF TECHNOLOGY
IN
C O MP UT E R S CI E N CE A N D E NG I NE E RI NG

Submitted by
Brahadeesh Kishore
16BT6CS006

Karishma Kurickal
16BT6CS011
School of Engineering & Technology
Department of Computer Science and Engineering
Jain Global campus
Kanakapura Taluk - 562112
Ramanagara District
Karnataka, India

CERTIFICATE

This is to certify that the Project work titled “Wine Quality Analysis” for the course
Machine Learning (16CIC73) during 7th semester, is carried out by Brahadeesh Kishore
(16BT6CS006), Karishma Kurickal (16BT6CS011) are bonafide students at the School of
Engineering & Technology, JAIN (Deemed-to-be-University), Bangalore in partial fulfilment
for the award of degree in Bachelor of Technology in Computer Science and Engineering,
during the year 2019 - 2020.

Prof. Shilpa Das Dr. Narayana Swamy Ramaiah

Assistant Professor Head of the Department,
Dept. of Computer Science and Engineering, Dept. of Computer Science and Engineering ,
School of Engineering & Technology, School of Engineering & Technology,
JAIN (Deemed-to-be-University) JAIN (Deemed-to-be-University)
Date: Date:
TABLE OF CONTENTS
Chapter 1 01
1. INTRODUCTION 01
1.1 Problem Definition 01
1.2 Objectives 02
1.3 Methodology 04
1.4 Software Requirements 05
1.5 Tool Description 06

Chapter 2 07
2. IMPLEMENTATION 07
2.1 Design and Implementation 07
2.1.1 Implementation Mechanism 08
2.1.2 Major Considerations for Implementation 09
2.1.3 Source Code 10
2.2 Machine Learning algorithm Used 11

Chapter 3
3. RESULTS AND DISCUSSION 12

Chapter 4
4. CONCLUSION 13

REFERENCES 14
Chapter 1

Introduction
(Introduction to the problem chosen, the domain of the problem, which other problems
were there, and why did you choose this particular problem, should be given here within 1 –
2 paragraphs.)

Other Issues in the Domain (along with the chosen issue) –

• i.
• ii.
• Iii.
• Iv.

Brief Explanation about relevance of chosen issue, and why is it important.

1.1. Problem Definition

The problems this project aims to solve are:

 Predicting the quality rating of a new kind of wine, given the properties and quality
ratings of many other types of wine
 Testing multiple algorithms and see which performs the best on the given data

1.2. Objectives

This objectives of this project are the following:

 To compare and contrast various machine learning algorithms and their training
set/test set performance on the given dataset
 To correctly classify the wine quality of wine types in the test set, based on their
features

1.3. Methodology
The architecture/workflow to solve the addressed problems should be explained, along with
how you plan to implement the proposed model.

1.4. Software Requirements

This project uses the following tools for its implementation:

1. Python: The programming language used to implement this project.
2. Jupyter Notebook: Used to provide an interactive environment for python, for the
implementation of this project
3. Google Colab: Used to run Jupyter Notebooks remotely and save them to Google
Drive, hence increasing portability. Google Colab also provides additional hardware
resources, enhancing execution times.
4. Scikit-Learn: Used to implement the several machine learning models used in this
project, as well as view their accuracies.
5. Matplotlib: Used to plot graphs for us to understand the trends in
accuracy/performance of the various machine learning algorithms implemented.
6. Pandas: Used to analyse and clean our dataset.

1.5. Tool Description

The various tools used in this project are described below.

1. Python
Python is an interpreted, high-level programming language created by Guido van
Rossum in 1991. It emphasizes on code readability by using whitespace to terminate
statements and blocks. Its ease of use and syntax makes it a language of preference
for data analysis.

2. Jupyter Notebook
It is a web application that provides an integrated development environment for
Python. Using this, one can share documents that contains equations, visualizations
such as graphs, text, as well as live code. For these reasons, it is a highly used tool for
the purposes of data analysis.

3. Google Colab
Google Colab is a free cloud service that allows you to write and run Jupyter
Notebooks on the cloud, rather than having to install Jupyter Notebook and the
necessary packages on your machine. It provides the advantage of portability, since
Jupyter Notebooks are saved to your Google Drive, and can hence be run anywhere.
The main advantage of Google Colab is the system resources that are provided,
which help speed up training time for machine learning models. This includes GPUs
and additional RAM. This is helpful since many local systems may not have adequate
system resources to train these models as quickly.

4. Scikit-Learn
It is a python package used for data modelling. It provides a number of supervised
and unsupervised machine learning models. Scikit-learn makes it extremely simple to
train models, with simple function calls on the input data being all that is needed.

5. Matplotlib
This is a plotting library for the python programming language, and is used to make
plots and graphs based on provided data. Plots help us understand trends, patterns,
and make correlations. Understanding trends in data by simply looking at numbers is
difficult, so matplotlib provides visuals for this purpose.

6. Pandas
Although Scikit-learn provides models for the training of data, it is not concerned
with the preparation and cleaning of this data. That is where the use of pandas
comes in. Pandas is an open source library used to import datasets in a variety of
formats, analyse, clean this data. It is written in C and supports vectorized
operations. That is, it supports the updating of multiple elements of a row or column
in parallel, hence eliminating the need for an explicit for loop to update these rows
and columns. Hence it is highly optimized.

Chapter 2

Implementation
2.1. Design & Implementation
The implementation of this project involved the following steps:

 Import the necessary packages

 Import the required dataset and clean the dataset
 Split the dataset into a training and test set
 Apply three algorithms: K Nearest Neighbors, Naïve Bayes, and Random Forests.
Compare the accuracies on the training and test set.
 Generate graphs comparing train and test set accuracies for different algorithms

The implementation is described in detail below.

2.1.1. Implementation Mechanism

The three machine learning algorithms that we will implement are K Nearest Neighbors,
Random Forests, and Naïve Bayes. First and foremost, the dataset had to be imported and
cleaned. Since we are using Google Colab, we must upload the dataset to the cloud. The
code snippet is given below.

import pandas as pd
from [Link] import files
[Link]()
df = pd.read_csv("[Link]")
df

A brief summary of the dataset is returned.

As we can see, every column except for type and quality contains continuous numerical
values. The quality is what we are trying to predict. Since the type column consists of
strings, we first convert them to numeric values. We execute this line of code next:
df['type'].unique()
output: array(['white', 'red'], dtype=object)

Since there are only two types of wine, namely red and white, we can encode these as 0s
and 1s respectively. Additionally we also drop all rows containing null values.

df['type'] = [Link](df['type'] == 'white', 0, 1)

df = [Link](axis=0)

Dataset after cleaning:

Now that we are done cleaning the dataset we can create two vectors, one for inputs and
one for labels, then divide these into training and test sets. We will use 70% of the data as
the training set and the remaining 30% as the test set. Our labels will be the wine’s quality
rating. We split the data using scikit-learn’s train_test_split function.

from sklearn.model_selection import train_test_split

X = [Link][:,1:]
X['type'] = df['type']
y = df['quality']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3

)

The data is now ready for the ML algorithms to be used on. The necessary packages are
given below.

from [Link] import KNeighborsClassifier

from sklearn import metrics
from [Link] import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB

The code for the execution of these are given below.

knn = KNeighborsClassifier(n_neighbors=10, p=1)

[Link](X_train, y_train)

2.1.2. Major Considerations for Implementation

While cleaning the data, it is important to convert all string values to numeric values before
proceeding, since strings cannot be directly inputted into ML models by scikit-learn. An
example of encoding strings to values would be [‘a’, ‘b’, ‘c’] -> [0, 1, 2]. In our case, we only
had 2 different types of string values so we could just encode them as 0s and 1s. Rows
containing null or undefined values were dropped. Since we want to achieve the highest
possible accuracy, we did not risk replacing null values with any predefined value and
preferred dropping them altogether. Since we had enough training data, this step was
feasible.
Additionally, since K-NN is a distance-based algorithm, we need to scale the data
appropriately. If data is not scaled, the contribution of certain features would be greater
than others. For example, the contribution of the total sulfur dioxide feature to the
prediction of the wine’s quality rating would be much more than say, sulphates, since the
former has a higher range of values. We tested K-Nearest Neighbors both with and without
scaling features between 0 and 1 and noted the accuracy.

2.1.3. Source Code

Partial / Complete Source Code

2.2. Machine Learning Algorithms Used

1. K Nearest Neighbors

Chapter 3

Results and Discussion

 Detailed graphs for comparison between the 3-4 methods used, indicating performance with
respect to the dataset, explanation of the graphs, what do they indicate, why do they
perform as they do.
 Accuracy achieved in values should also be given here, for all the methods.
Chapter 4

Conclusion
Conclude here what did you propose to do, how much you did, how well did you obtain
results, this should be short story on the entire work while you explain, like a revisit to the
entire project.
References

1. Should be in IEEE format – don’t make mistake here, should be related to your problem only,
don’t give absurd references – keep it 10 -12.

2. Author Names – (First and Last Name of each author), “Title of the Paper”, Name of the
Journal/Transaction Paper, Volume Number, Publisher, Page number as pp, Month and Year
of Publishing.

3. Author Names – (First and Last Name of each author), “Title of the Paper”, Name of the
Conference, Volume Number, Page number as pp, Month and Year of Publishing.

4. Ex - G. Eason, B. Noble, and I. N. Sneddon, “On certain integrals of Lipschitz-Hankel type

involving products of Bessel functions,” Phil. Trans. Roy. Soc. London, vol. A247, pp. 529–
551, April 1955.
5. Ex - I. F. Akyildiz, W. Su, Y. Sankarasubramaniam, and E. Cayirci, “Wireless sensor
networks: a survey”, Computer Networks, Elsevier, vol. 38, no. 4, pp. 393– 422, Mar. 2002.

ML Internship: Red Wine Analysis
No ratings yet
ML Internship: Red Wine Analysis
31 pages
Project Report Hate
100% (1)
Project Report Hate
24 pages
Machine Learning Project Guide
100% (2)
Machine Learning Project Guide
26 pages
Introduction to Machine Learning with Python
0% (2)
Introduction to Machine Learning with Python
8 pages
Emoji Creation with Machine Learning
100% (2)
Emoji Creation with Machine Learning
58 pages
Machine Learning Projects For Beginners
100% (2)
Machine Learning Projects For Beginners
9 pages
Data Science Unit 1
No ratings yet
Data Science Unit 1
30 pages
Logistic Regression Basics
No ratings yet
Logistic Regression Basics
1 page
Internship Report On Machine Learning
100% (1)
Internship Report On Machine Learning
26 pages
NN Model and Gap Statistic Analysis
80% (10)
NN Model and Gap Statistic Analysis
14 pages
Scikit-Learn Overview and Algorithms
100% (2)
Scikit-Learn Overview and Algorithms
12 pages
Internship PPT Salary-Prediction-Model-Leveraging-Machine-Learning
No ratings yet
Internship PPT Salary-Prediction-Model-Leveraging-Machine-Learning
10 pages
Python Programming Training Report
No ratings yet
Python Programming Training Report
34 pages
Python Seaborn Notes
No ratings yet
Python Seaborn Notes
28 pages
Machine Learning Techniques Overview
No ratings yet
Machine Learning Techniques Overview
58 pages
Feature Engineering PDF
100% (1)
Feature Engineering PDF
75 pages
Machine Learning Basics with Python
100% (3)
Machine Learning Basics with Python
41 pages
Sentiment Analysis with AI-Deep Learning
No ratings yet
Sentiment Analysis with AI-Deep Learning
74 pages
Internship Report On Data Science
No ratings yet
Internship Report On Data Science
33 pages
CS 601 ML Lab Manual
0% (1)
CS 601 ML Lab Manual
14 pages
Machine Learning Internship Report
33% (9)
Machine Learning Internship Report
31 pages
Understanding Data Science Essentials
No ratings yet
Understanding Data Science Essentials
18 pages
Numpy Interview Questions: Click Here
100% (1)
Numpy Interview Questions: Click Here
32 pages
Data Science Lab Guide
No ratings yet
Data Science Lab Guide
98 pages
Face Detection & Emotion Recognition Project
No ratings yet
Face Detection & Emotion Recognition Project
26 pages
Python Project for Fake News Detection
No ratings yet
Python Project for Fake News Detection
7 pages
NLP Final Mini Project
No ratings yet
NLP Final Mini Project
17 pages
Python Full Stack Development Summer Internship Report
No ratings yet
Python Full Stack Development Summer Internship Report
44 pages
Chatbot Abstract
No ratings yet
Chatbot Abstract
6 pages
Campus Placement Prediction with ML
No ratings yet
Campus Placement Prediction with ML
5 pages
Face Mask Detection with Python AI
No ratings yet
Face Mask Detection with Python AI
16 pages
Data Science
No ratings yet
Data Science
17 pages
Deep Learning for Beginners
100% (1)
Deep Learning for Beginners
87 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
Introduction To Relational Databases ANWSER
100% (1)
Introduction To Relational Databases ANWSER
15 pages
Data Mining Lab Manual COMPLETE GMR
No ratings yet
Data Mining Lab Manual COMPLETE GMR
66 pages
Data Mining Basics for Beginners
100% (1)
Data Mining Basics for Beginners
7 pages
DIP Mini Project
100% (1)
DIP Mini Project
12 pages
Fake Profile Detection
100% (1)
Fake Profile Detection
69 pages
Data Science: Stats & Probability
No ratings yet
Data Science: Stats & Probability
13 pages
What Are The Types of Machine Learning?
100% (1)
What Are The Types of Machine Learning?
24 pages
Skin Cancer Detection with DL
100% (2)
Skin Cancer Detection with DL
5 pages
CNN Architecture & Training Guide
No ratings yet
CNN Architecture & Training Guide
7 pages
LP3 - ML Mini-Project Report Format Shreeyas
No ratings yet
LP3 - ML Mini-Project Report Format Shreeyas
13 pages
Data Mining Techniques and Models
No ratings yet
Data Mining Techniques and Models
84 pages
Data Science Theory: Analysis and Analytics
No ratings yet
Data Science Theory: Analysis and Analytics
14 pages
Regression, Classification and Clustering
100% (2)
Regression, Classification and Clustering
23 pages
Python ML Interview Questions
No ratings yet
Python ML Interview Questions
4 pages
Notes of Data Science Unit 3
No ratings yet
Notes of Data Science Unit 3
22 pages
Fundamentals of Data Science Unit 4
100% (1)
Fundamentals of Data Science Unit 4
31 pages
Machine Learning/ Artificial Intelligence (MLAI) Internship
No ratings yet
Machine Learning/ Artificial Intelligence (MLAI) Internship
4 pages
Ensemble Learning Techniques Explained
100% (1)
Ensemble Learning Techniques Explained
12 pages
Capstone Project - Airline Passenger Satisfaction
No ratings yet
Capstone Project - Airline Passenger Satisfaction
18 pages
A Mini Project Report On: "Big Mart Sales Prediction" by
67% (3)
A Mini Project Report On: "Big Mart Sales Prediction" by
23 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
56 pages
Machine Learning Classification Guide
No ratings yet
Machine Learning Classification Guide
7 pages
Unit 1 - Machine Learning - WWW - Rgpvnotes.in
No ratings yet
Unit 1 - Machine Learning - WWW - Rgpvnotes.in
23 pages
Understanding Machine Learning Basics
100% (1)
Understanding Machine Learning Basics
39 pages
E-Commerece Data Analysis Documentation
No ratings yet
E-Commerece Data Analysis Documentation
15 pages
Wine Quality Prediction Project Report
No ratings yet
Wine Quality Prediction Project Report
12 pages
Differential Amplifier Using Transistors: Out1 Out2
No ratings yet
Differential Amplifier Using Transistors: Out1 Out2
2 pages
EXPLORE ESP32 MICROPYTHON - Python Coding, Arduino Coding, Raspberry Pi, ESP8266, IoT Projects, Android Application Projects
100% (13)
EXPLORE ESP32 MICROPYTHON - Python Coding, Arduino Coding, Raspberry Pi, ESP8266, IoT Projects, Android Application Projects
347 pages
Shubh Garg: AI Engineer Profile
No ratings yet
Shubh Garg: AI Engineer Profile
1 page
Bond Total Return Swaps 1731893567
No ratings yet
Bond Total Return Swaps 1731893567
23 pages
Understanding Investment Risk and Returns
No ratings yet
Understanding Investment Risk and Returns
30 pages
Computer Architecture Overview and Functions
No ratings yet
Computer Architecture Overview and Functions
19 pages
Excel Financial Modeling Guide
No ratings yet
Excel Financial Modeling Guide
20 pages
Stick Insect Kinematics in Walking Systems
No ratings yet
Stick Insect Kinematics in Walking Systems
16 pages
Understanding Artificial Intelligence Basics
No ratings yet
Understanding Artificial Intelligence Basics
10 pages
Nmath 3
No ratings yet
Nmath 3
1 page
Fiat Kobelco E80 Evolution Workshop Manual
100% (61)
Fiat Kobelco E80 Evolution Workshop Manual
20 pages
Maa HL 1.3-1.6 Sequences
No ratings yet
Maa HL 1.3-1.6 Sequences
36 pages
ISDN PRI & Signaling Config Guide
No ratings yet
ISDN PRI & Signaling Config Guide
56 pages
Dbms Practical File
No ratings yet
Dbms Practical File
17 pages
Treysta V1
100% (5)
Treysta V1
7 pages
M100Z79 Palfinger Schematy Hydrauliczne
No ratings yet
M100Z79 Palfinger Schematy Hydrauliczne
34 pages
Solving Ratios and Proportions Guide
No ratings yet
Solving Ratios and Proportions Guide
3 pages
Energy Transfer and Efficiency Calculations
100% (1)
Energy Transfer and Efficiency Calculations
1 page
HGH Guide
No ratings yet
HGH Guide
9 pages
Unix - Shell Scripting
100% (2)
Unix - Shell Scripting
96 pages
Earth and Life Science - Earth Science Q1 Summative Test 2
No ratings yet
Earth and Life Science - Earth Science Q1 Summative Test 2
6 pages
Butler-Volmer Model for Lithium Titanate Batteries
No ratings yet
Butler-Volmer Model for Lithium Titanate Batteries
12 pages
Electrical Machines Design Questions
No ratings yet
Electrical Machines Design Questions
10 pages
10th Grade Geog Locating Exact Position On A Map
No ratings yet
10th Grade Geog Locating Exact Position On A Map
3 pages
Production Machinery Questionairre
No ratings yet
Production Machinery Questionairre
7 pages
4.3 Gas Turbines: 4.3.1 Technology Description
100% (1)
4.3 Gas Turbines: 4.3.1 Technology Description
11 pages
5S Self Audit Checklist in Hindi
No ratings yet
5S Self Audit Checklist in Hindi
8 pages
MTS Model 815 and 816 Rock Mechanics Test Systems
No ratings yet
MTS Model 815 and 816 Rock Mechanics Test Systems
12 pages
Sect 7-Threaded Fasteners
No ratings yet
Sect 7-Threaded Fasteners
15 pages
Chapter 10 Chemical Bonding and Molecular Structure
No ratings yet
Chapter 10 Chemical Bonding and Molecular Structure
9 pages

Wine Quality Analysis Project Report

Uploaded by

Wine Quality Analysis Project Report

Uploaded by

School of Engineering and Technology

Department of Computer Science and Engineering

For the partial fulfilment of

Prof. Shilpa Das Dr. Narayana Swamy Ramaiah

Other Issues in the Domain (along with the chosen issue) –

Brief Explanation about relevance of chosen issue, and why is it important.

1.1. Problem Definition

This objectives of this project are the following:

1.4. Software Requirements

This project uses the following tools for its implementation:

1.5. Tool Description

 Import the necessary packages

The implementation is described in detail below.

2.1.1. Implementation Mechanism

A brief summary of the dataset is returned.

df['type'] = [Link](df['type'] == 'white', 0, 1)

Dataset after cleaning:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3

from [Link] import KNeighborsClassifier

The code for the execution of these are given below.

knn = KNeighborsClassifier(n_neighbors=10, p=1)

2.1.2. Major Considerations for Implementation

2.1.3. Source Code

2.2. Machine Learning Algorithms Used

Results and Discussion

4. Ex - G. Eason, B. Noble, and I. N. Sneddon, “On certain integrals of Lipschitz-Hankel type

You might also like