Coffee Quality Classification with Feature Importance Analysis

C O F F E E Q UA L I T Y
C L A S S I F I C AT I O N
S P E C I A LT Y C O F F E E Q U A L I T Y C L A S S I F I C AT I O N
W I T H F E AT U R E I M P O R TA N C E
ALCI SEBASTIAN BURGOS MORENO
thesis submitted in partial fulfillment

of the requirements for the degree of
master of science in data science & society
at the school of humanities and digital sciences
of tilburg university
student number
U217300
committee
dr. Grzegorz Chrupała
dr. Marijn van Wingerden
location
Tilburg University
School of Humanities and Digital Sciences
Department of Cognitive Science &
Artificial Intelligence
Tilburg, The Netherlands
date
June 25, 2021
acknowledgments
I want to thank to my family for their support in my studies throughout
the years.
I want to thank dr. Robert McKeon for sharing the passion for coffee and
the Sweet Maria’s dataset. Finally, I want to thank to my supervisor dr.
Grzegorz Chrupała for the wise guidance provided in the writing of this
thesis.
C O F F E E Q UA L I T Y
C L A S S I F I C AT I O N
S P E C I A LT Y C O F F E E Q U A L I T Y C L A S S I F I C AT I O N W I T H
F E AT U R E I M P O R TA N C E
alci sebastian burgos moreno
contents
1 Introduction 4
1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Project Relevance . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Project Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 Research Questions . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Related Work 8
2.1 Similar Research . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Feature importance . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Coffee Quality Prediction and Classification . . . . . . . . . 10
3 Experimental Setup 12
3.1 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2 Data Description . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.1 Coffee Quality Institute . . . . . . . . . . . . . . . . . 12
3.2.2 Sweet Maria’s Coffee . . . . . . . . . . . . . . . . . . . 12
3.3 Exploratory Data Analysis . . . . . . . . . . . . . . . . . . . . 13
3.3.3 Compatibility and discrepancy . . . . . . . . . . . . . 16
3.4 Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.5 Description of feature importance algorithms . . . . . . . . . 19
3.5.1 Filter Methods . . . . . . . . . . . . . . . . . . . . . . 19
3.5.2 Wrapper Methods . . . . . . . . . . . . . . . . . . . . 20
3.5.3 Embedded Methods . . . . . . . . . . . . . . . . . . . 21
3.5.4 Hierarchical Clustering . . . . . . . . . . . . . . . . . 23
3.6 Evaluation Algorithm: K Nearest Neighbors . . . . . . . . . 24
3.6.1 Subset Evaluation . . . . . . . . . . . . . . . . . . . . . 25
1
CONTENTS 2
3.7 Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.8 Evaluation Metric . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.9 Repository . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Results 27
4.1 Specialty Coffee Cupping Most Important Features . . . . . 27
4.1.1 All-variables set . . . . . . . . . . . . . . . . . . . . . . 27
4.1.2 Non-cupping scores subset . . . . . . . . . . . . . . . 30
4.1.3 Non-proxy scores subset . . . . . . . . . . . . . . . . . 31
4.2 Outstanding specialty coffee classification . . . . . . . . . . . 31
4.3 Hierarchical Clustering for Coffee Classification . . . . . . . 33
5 Discussion 36
5.1 RQ1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.1.1 Specialty Coffee Feature Importance . . . . . . . . . . 36
5.1.2 Outstanding Specialty Coffee Feature Importance . . 38
5.2 RQ2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.3 RQ3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
6 Appendix 48
CONTENTS 3
Abstract
Coffee is one of the most traded commodities in the world and the
biggest producers of the black bean are developing countries; farmers
in such countries can greatly benefit of the high demand of coffee and
more now since specialty coffee enjoys an en vogue status. The price
of the coffee is established in online auctions and bidders rely on the
reputation of a crop, this prestige is build upon coffee cupping scores;
this thesis main focus is to study the features in the coffee evaluation
to understand the features importance and explore the possibility of
including environmental features in the assessment of coffee quality.
The most important features of coffee are the taste variables; flavor
is the most important for classification of specialty coffee, while the
texture based features come at last, despite of having features that
are more important than others, all of the features contribute greatly
in the classification tasks.
1 introduction 4
1 introduction
Coffee cupping is the sensory examination and evaluation of coffee, this is

still a matter of subjective perception and the understanding of the coffee
features should lead to the creation of a more robust evaluation system.
The relevance of comprehending the feature importance goes beyond eco-
nomical scope, as the social realm is impacted greatly by the prices of
coffee.
Scientifically speaking coffee belongs to the family of the Rubiaceae plants
and there are more than 13.000 different species of coffee, however just
two are widely consumed: Coffea Arabica and Coffea canephora var. Ro-
busta (Wikström, Kainulainen, Razafimandimbison, Smedmark, & Bremer,
2015). Out of these two varieties Arabica is the most produced; between
February 2020 and February 2021 81.05 million bags of Arabica were ex-
ported in comparison to 47.52 million bags of Robusta (International Coffee
Organization, 2021).
1.1 Context
Coffee has been rated as the second most traded commodity in the world
market and even though that affirmation remains inaccurate; coffee is still
one of the most traded seeds in the world (Mussatto, Machado, Martins,
& Teixeira, 2011); the coffee economy is worth more than $12.2 billion
dollars and about 142 million people rely on the agriculture of coffee as
stated by Reis Guimarães, Carlos dos Santos, Montagnana Vicente Leme,
and da Silva Azevedo (2020). The impact of the coffee economy in the
countries where this is harvested has impacted socially and physically
the communities in a positive way, not only allowing big traders to profit
throughout a free-trade market, but also allowing its inhabitants to increase
the life standard in the countryside according to Lashermes, Andrade, and
Etienne (2008), but farmers still need to adapt to current retailers requests
since the industry of coffee is on constant mutation; not only boosted by
environmental factors such as global warming, but also propelled by the
growing demand of high quality coffee (Wollni & Zeller, 2007), according
to Lannigan (2020) this demand occurs due to a trend of the market where
connoisseur consumption is now the rule, the consumers now research
about the quality of their products not only on the ecological production
process of goods itself, but also on the final quality; therefore is pertinent for
coffee evaluation to study the importance of variables and its interactions
to deliver a conscious rating of taste.
1 introduction 5
1.2 Project Relevance
In order to sell a bag of coffee in the international market a farmer needs to

belong to a farmers cooperative, the farmers association collects the coffee
from local crops and sells it on a virtual auction market, however the of-
fered investment in the bids are based on the reputation of the cooperative;
reputation that is acquired by the production of high quality coffee; quality
per se is a broad term and usually is discussed over different factors, such
as: bean visible defects, taste or even time of the year; for the international
coffee trade market the quality is granted by a certification of a professional
"Q Grader" performed to a coffee sample of each crop that belongs to a
cooperative.
There are different methods for professional coffee cupping, however most
of the models created are based on the Specialty Coffee Association (SCA)
method, the success of this method is based on the grounds that "sensory
descriptive methods used for testing [...] are highly reliable and consistent
and [...] identify the human perception of flavor" (Chambers & Koppel,
2013, p. 2) and "coffee green and roasted beans contain an appreciable
amount of aliphatic hydrocarbons, probably derived from the oxidation of
green bean lipids during storage or transport"(Rodríguez, Durán, & Reyes,
2010, p. 37), which also means that the odor of green coffee seeds the
characteristics are notable to the human nose.
The Specialty Coffee Association procedure was created for standardiza-
tion of the coffee evaluation in 1984 and it had few revisions over the last
decades (Lingle & Menon, 2017); the coffee cupping evaluation starts by a
visual review of the beans; roasting color must be checked for defects and
then in different scales the other aspects are checked in the tasting phase; by
sniffing the sample the Q-Grader checks for fragrance and aroma and those
features are rated on a dry and warm wet condition; subsequently after let-
ting the sample cool down a mouth aspiration of the sample infusion will
give the insight for rating the Flavor, Aftertaste, Acidity and Body of the
coffee; after the sample has cooled down to room temperature Sweetness,
Uniformity, and Clean Cup are evaluated (Donnet & Weatherspoon, 2006).
After the professional Q-Cup grader has finished the cupping process the
given sample is classified by the sum of the total of scores as described in
the table:
1 introduction 6
Table 1: Description of specialty coffee classification.
In light of existence of different classification models it is necessary

to highlight that in all models some variables encompass two or more
variables simultaneously; such approach of feature spanning could lead to
redundant variables and biased evaluation, therefore it is relevant to this
project to evaluate if feature selection prompts to a better model.
On the order hand, economical relevance for the average farmer is the price
of testing each sample, which can cost between $150 to $400, such price has
two direct effects: The first is that the price of coffee cupping certification
affects directly the prices in final consumer and lastly it decreases the
profit of the farmers (Conley & Wilson, 2020). The animadversion to the
current coffee cupping evaluation goes above the price implication as coffee
cupping even under laboratory condition environments has undergone
many criticism, because it has been proved that is still liable to subjective
experience that can lead to a misread of the taste (Bravo-Moncayo, Reinoso-
Carvalho, & Velasco, 2020; J. Li, Streletskaya, & Gómez, 2019; Spence &
Carvalho, 2019).
1.3 Project Goal
This project aims to understand the interaction between features to discover

the feasibility of the creation of a new evaluation system and suggest pos-
sible new approaches in the coffee evaluation; this is based on the fact that
Q-Cup grades determine the quality classification; however some scores
could be redundant and coffee taste quality could be also classified through
the understanding of extrinsic factors (Bertrand et al., 2006; Mazzafera,
1999).
Previous works have approached the quality of beverages through Machine
Learning methods, in fact the work from Z. Li and Suslick (2017) used an
electronic nose that uses the UV light spectrometric fingerprint of a com-
pound and digitalized in a RGB array the alcoholic beverage characteristics,
after feeding a library with different 14 different liquor brand samples they
treated the data with hierarchical clustering analysis (HCA) and Principal
1 introduction 7
Component Analysis (PCA), the subset was used for training a Support
Vector Machine that classified with a 99% accuracy.
A similar approach was done by (Rodríguez et al., 2010); they used an
electronic nose (with gas chromatography detection) with 8 different sen-
sors for detecting quality defects on coffee; a Multilayer Perceptron Neural
Network with PCA preprocessing was trained, this approach was able to
classify on a 100% accuracy. A latter work from (Chang et al., 2021) used
the exact same approach, but in this case to classify the flavor of quality
coffee among 9 different classes, they used a Support Vector Machine with
a Radial Basis Function Kernel (RBF) and a Residual Neural Network; the
first had a 78.91% accuracy, while the second 78.79%.
Nonetheless, none of the previous studies contemplated the influence mag-
nitude of each feature in the classification task; therefore understanding
the importance of them is necessary to contribute to the evaluation meth-
ods when no electronic device is available; the use electronic noses for
classification is deemed to be expensive and until the moment not easily
available to the daily coffee grader nor farmer. In order to understand the
feature importance in the coffee cupping scores different machine learning
algorithms can be used for feature importance; filter methods, wrapper
methods and embedded methods, each method has different advantages
and difficulties that need to be assessed on the type on the data to analyze.
1.4 Research Questions
Since the intention of this project is to understand the features that play
a major role in the classification of coffee quality the following research
questions will be answered:
• RQ1 Which features influence specialty coffee quality classification?
• RQ2 Can feature selection yield a better result than using the whole data
set for specialty coffee classification?
• RQ3 Does including features other than taste itself in specialty coffee
classification increase performance of the model?
2 related work 8
2 related work
Due to the nature of coffee evaluation related works will be presented in

three parts: The first part will show the research done on wine’s quality
prediction and classification through feature engineering, a second ap-
proach will be done on previous research of feature importance in different
fields and lastly there will be a presentation of previous projects that used
coffee cupping scores for quality prediction and classification.
2.1 Similar Research
In order to understand the feature importance in coffee classification is

essential to understand the similarity of wine tasting process, as "the wine
industry uses a tasting system similar to the coffee cupping process to
assess material and symbolic attributes of wines, and at the end of the
tasting process, expert tasters assign a quality grade to each wine" (Traore,
Wilson, & Fields, 2018, p. 352).
Available data sets have been broadly used for prediction of wine quality;
the UCI Wine data set was used by Gupta (2018) to build a linear regression
model, the algorithm was used for determining the correlation of variables
with the score of wine quality, and R2 to determine the influence of all
the variables on the final score quality, initially the value of adjusted R2
showed that for red wine quality a 35.6% was dependent on all the vari-
ables, while for white wine it was a 28.02%; with the results of the linear
regression Gupta trained two models; one subset considered the variables
with significant correlation with quality score and the other considered all
the variables to train up a neural network, the comparison between those
2 models for red and white wine conveyed that test error in the neural
network was higher when using all variables: white: 24.15% / red:19.56%
than in comparison to the subset of 8 variables: white: 20.74% / red:14.60%;
this research suggest a possible causality between feature selection and
higher accuracy.
In contrast, in the research of Aich, Al-Absi, Lee Hui, and Sain (2019)
two wrappers feature selection methods where evaluated: Genetic based-
algorithm (GA) and Simulated Annealing (SA); the classifiers used for
evaluating the subsets created by the mentioned feature selection methods
were: PART, RPART, Bagging, C5.0, random forest, Support Vector Machine
(SVM) and Linear Discriminant Analysis (LDA). The evaluation metrics
showed that Simulated Annealing (SA) prompted to the best feature subset
when using SVM as classifier; the accuracy was 95.23% for red wine and
98.81% for white wine, while the sensitivity scored at a 97.17% and 99.34%
respectively. (Aich et al., 2019) exhibit in their research that a wrapper fea-
2 related work 9
ture selection method could create a subset that prompt to higher accuracy,
however there are flaws in the research: the first problem is that wrapper
methods are bound to overfitting, no measure to quantify nor remedy
overfitting are presented in the paper; the second nuisance is the lack of
the subset details, therefore no individual contribution of each feature to
the final scores of wine quality can be evaluated, neither is available the
information for study replication.
Other different approach can be evidenced in the research of Laughter and
Omari (2020), in their investigation they extracted features by employing
PCA. The used classifiers were a Multilayer Neural Network with SGD and
ADAM as optimizers (the optimizers are highlighted as those algorithms
could reduce the risk of overfitting), a Decision Tree and a Support Vector
Machine (SVM), the model that ranked with the highest accuracy score for
classification was a Random Forest with a 72.4% of accuracy. None of the
previous works contemplated in the discussion the influence of imbalanced
data when selecting or extracting features; it is an issue when conducting
research with the UCI Wine dataset as low quality wines are underrepre-
sented in the data; in contrast to the previous studies, the investigation of
Hu, Xi, Mohammed, and Miao (2016) employed Synthetic Minority Over-
sampling Technique (SMOTE) for balancing the data. AdaBoost, Decision
Tree and Random Forest were used as classifiers; the later performed with
the best accuracy by reaching a 94.6% of accuracy in the test set. The work
of Hu et al. (2016) suggested the SMOTE technique, which is also used
in this thesis, as according to Fernández, García, Herrera, and Chawla
(2018) SMOTE is the robust standard in the industry for oversampling the
minority class.
2.2 Feature importance
On the other hand, the research of feature importance in coffee taste qual-
ity has been elusive in the academia; this can be mainly attributed to the
current trends in the research field; the current investigations are focused
mainly on the use of neural networks without a previous treatment of
features; still there are studies in the direction of feature engineering, but
such research projects are focused on dimensionality reduction by feature
extraction. In light of the existing literature it is necessary to analyze
feature importance research out of the beverage quality field.
For instance, the paper of Valko and Hauskrecht (2010) exposes which
features of clinical data influence physician’s decision about ordering dif-
ferent laboratory tests; in the research a multivariate analysis prompted for
an understanding of all the main features that influenced the classification
tasks; all of the variables where used for training a model on a SVM,
2 related work 10
then the AUC was evaluated for each variable; with those variables three
different subsets where evaluated: (1) Top 1 feature; (2) top 3 features;
and (3) top 30 features; in general the model with just 1 top feature could
predict up to an 87.46% of precision, adding all of the variables to the
model improved in certain cases the precision in the prediction, however it
would not add more than 10% in any case. The greedy selection method
was useful for that research, however it did not allow to study the feature
dependencies and it limited the research to an explanatory point of which
features contribute mostly to the SVM precision.
On the other hand, the research of Ginsburg, Lee, Ali, and Madabhushi
(2016) investigated the feature importance in Nonlinear Embeddings (FINE)
and it applied this method for quantitative histomorphometry (QH); a field
that uses pathology images for disease prediction or outcome. The ap-
proach used for such investigation is based on feature extraction; kPCA
was used for feature extraction, then the eigenvectors were used for train-
ing a model; a SVM and a RF were used as classifiers; the metric AUC
allowed to review which eigenvectors contributed the most on the classi-
fication task. After determining the importance of each eigenvector, the
same process was executed with each feature in each of the most important
eigenvectors. Their trained FINE model achieved a higher accuracy than
the baseline models with Fisher Scores (AUC:0.53 -0.67) and Gini Impu-
rity (AUC:0.58-0.75), depending on the pathology the model with FINE
the reported AUC between 0.74–0.93. This study used a dimensionality
reduction in order to build an accurate predictive model that avoided the
curse dimensionality; this is a great risk in certain types of datasets, in
this case they had 4 medical images datasets, one of them contained 140
vectors and 2343 different variables; although the predictive capacity of the
model is high, the relevance of each of the variables is not stated explicitely
in the paper. This lack of specificity may be comprehensible as the most
important features could be more than 100; however its biggest limitation
is the use of eigenvectors, as those may create an incapacity to account
for interactive effects among variables. As well, the author of that paper
acknowledges that filter methods, wrapper or ensemble methods, could
prompt to feature subsets that produce similar classification results.
2.3 Coffee Quality Prediction and Classification
In contrast, just a few research papers use existing databases with a clear
methodology to classify the specialty coffee. The research of Suarez-Peña,
Lobaton-García, Rodríguez-Molano, and Rodriguez-Vazquez (2020) has
used different variables other than the tasting of coffee itself; with a
database of 56 vectors and a 10-fold cross validation stratification of data a
2 related work 11
4-layer Neural Network was trained for classification; and it was able to
classify quality coffee on a 81% accuracy and another model with SVM
using a RBF kernel performed a 88% accuracy; however, a lack of rigor is
present as the size of the database does not allow a full test data set totally
independent of the training data and the variables used (34) for training
are more than half of the accounted cases for the model. which could
prompt to an overfitting model. On the other hand, Yuriko and I Dewa
(2020) used the Coffee Quality Institute database and a General Regression
Neural Network was built for prediction; in it : "The model’s performance
is measured with MSE and MAE with the best MSE value of 0.097 and
MAE value 0.245" (Yuriko & I Dewa, 2020, p. 189).
The previous studies use complex classification methods and different types
of feature selection methods for prediction and classification, due to the
complex nature of the available datasets and the evidenced methodology
used the previous mentioned scientific works; this project thesis will use
3 different models of feature importance: Filter, wrapper and embedded
methods. The approach on different methodology should prompt to similar
data or data that allows for the coffee tasting phenomena comprehension.
3 experimental setup 12
3 experimental setup
3.1 Software
To conduct the analysis of the data and fitting the Machine learning models
this thesis project used Jupyter Notebook on Python (Version 3), as well
the following libraries were executed:
i Pandas (McKinney et al., 2010)
ii Numpy (Harris et al., 2020)
iii Seaborn (Waskom et al., 2017)
iv AutoImpute (Talwar, Mongia, Sengupta, & Majumdar, 2018)
v ScikitLearn (Buitinck et al., 2013)
vi ImbalancedLearn (Lemaître, Nogueira, & Aridas, 2017)
vii MatplotLib (Hunter, 2007)
viii XGBoost (T. Chen & Guestrin, 2016a)
ix Jupyter Notebook (Kluyver et al., 2016)
3.2 Data Description
For the study of the feature importance for coffee quality classification two
data sets have been analyzed and both were created from a web scraping
tool.
3.2.1 Coffee Quality Institute

The first data set comes from information of the Coffee Quality Institute
and it was created by LeDoux (2018). The information was constructed
for the previous e-auctions performed by the institute and the cupping
standard evaluations are the same as the guidelines given by the Specialty
Coffee Association.
3.2.2 Sweet Maria’s Coffee

The second data set was created by McKeon (2020) and it contains the
ratings of Sweet Maria’s Coffee; the website of this company contains
Q-Cup scores of 314 different crops. A description of the features used of
the dataset can be found under "Table 3"; it is important to note that the
company Sweet Maria’s evaluate the coffee through a different procedure

than the standards of the Specialty Coffee Association. It is notable the
difference with the other two datasets as this one only contains quality
coffee, however the study of the feature importance is still applicable in it,
as we can still distinguish in quality from specialty coffee to outstanding
specialty coffee.
3.3 Exploratory Data Analysis

The file contains 1339 vectors and 43 variables; only 19 features remain
important to this research.
Features such as ’Variety’ and ’Region’ were dropped, as the encoding
of those categorical variables would create a high dimensional space for
the classifier and possibly the Hughes phenomenon would be present in the
analysis. Other variables such as ’Owner’, ’Certification.Body’,’Certification.Address’,’Certification.C
’Farm.Name’, ’Lot.Number’, ’Mill’, ’ICO.Number’, ’Company’, ’Producer’,
’Harvest.Year’, ’Owner.1’, ’Number.of.Bags’, ’Bag.Weight’ and ’In.Country.Partner’
were discarded as such information does not add valuable information
inherent to coffee in order to study the feature importance of the beans
themselves. Finally, the variables ’Grading.Date’ and ’Expiration’ only
add value for evaluating freshness of the bean, which cannot be detected
after coffee is processed (which is a prerequisite for the coffee cupping
procedure). The selected variables comprehend 3 different aspects: (1)Farm
data, (2)quality data (measured by tasters) and (3)bean data.
Table 2: Coffee Quality Dataset description.
As well, it is necessary to mention that even though the scores range

from 1-10 there is a difference variance among them:
Features Variance
method_Natural / Dry 0.13
method_Other 0.01
method_Pulped natural / honey 0.01
method_Semi-washed / Semi-pulped 0.03
method_Washed / Wet 0.22
color_Blue-Green 0.04
color_Bluish-Green 0.06
color_Green 0.17
Species 0.03
Aroma 0.17
Flavor 0.21
Aftertaste 0.22
Acidity 0.18
Body 0.17
Balance 0.22
Uniformity 0.54
clean_cup 1.10
Sweetness 0.55
cupper_points 0.28
Moisture 0.00
cat1_def 12.30
quakers 0.63
cat2_def 38.50
altitude_low_meters 222020.67
altitude_high_meters 241116.72
Table 3: Variance of each of the selected features after categorical encoding.


The dataframe contains 314 cases and 19 variables, the evaluation features
of Sweet Maria’s Coffee is different than the Specialty Coffee Association,
the scores go from 1-10 in 11 variables, so the final score could go from
1-110.
Table 4: Sweet Maria’s Coffee Dataset description.
Sweet Maria’s coffee dataset also has determined scores that range
between 1-10 therefore is it useful to have an overview of the variance:
Features Variance
Fragrance 0.13
Aroma 0.11
Brightness 0.18
Flavor 0.11
Body 0.10
Finish 0.13
Sweetness 0.05
Clean_cup 0.19
Complexity 0.29
Uniformity 0.08
Cuppers_correction 1.14
Quality 0.12
Table 5: Variance of each of the selected features after categorical encoding.
3.3.3 Compatibility and discrepancy

The Sweet Maria’s coffee evaluation method is a fork of the SCAA coffee
evaluation; therefore both of the data sets have a prominent similarity; the
variables that both of them share in the coffee evaluation are:
• Fragrance / Aroma
• Flavor
• Aftertaste / Finish
• Body
• Sweetness
• Clean Cup
• Uniformity
• Brightness / Acidity
On the other hand, the evaluation of both varies in essential aspects:
• Wet Aroma: The SCAA protocol evaluates the Wet Aroma inside of
the Fragrance and Aroma aspect as one main variable; Sweet Maria’s
evaluate the wet aroma separately from the Fragrance. It is partially
related with Aroma of the SCAA, however it is scored apart from it.
• Balance / Complexity: The SCAA recognizes that balance score is

measured by the interaction of flavor, aftertaste acidity and body; the
complexity in the Sweet Maria’s evaluation differs in the sense that it
evaluates the interaction only between flavor and aftertaste/finish.
• Cupper’s Correction/Cupper Points : Sweet Maria’s acknowledges

that the cupper’s correction is a booster in terms of personal prefer-
ence of the professional taster, while the SCAA assumes that this is
an objective score given by the interaction of all of the previously eval-
uated features. It seems a semantic difference more than a practical
difference in the evaluation.
In-depth description of each of the coffee cupping variables can be found

in Appendix A (page 48)
3.4 Preprocessing

In order to fit any Machine Learning model with a data set it is crucial to
have the full information or treated missing values. Two different models
for missing data imputation were used, one for categorical data and the
other for numerical variables. On the other hand, the exploratory data
analysis included outliers detection by the use of boxplots, the cases were
deleted and treated as exposed in the following paragraphs. The first
algorithm used for treating numerical missing data is the Multivariate
Imputer from Scikitlearn; the imputer estimates each feature of missing
data from all the others. In this case the only numerical missing data
found was a 17% of the variables "altitude high meters" and "altitude low
meters"; the library uses by default Ridge regression for value estimation,
however ARD regression was used, because literature suggests that "The
regularization performed by ARD is very adaptive, as all the weights are
regularized differently" (Michel, Eger, Keribin, & Thirion, 2011, p. 2). The
second algorithm that was used for categorical missing data is the Sin-
gleImputer from the AutoImpute package, this library is based on MICE
package, which is native on R language; the function SingleImputer uses
a multinomial logistic regression for imputation using all the variables to
estimate the value to impute.
The proportion of quality classes present in the dataset show a clear imbal-
ance towards the high-quality coffee represented by 85.4%, while the low
quality label is present in 14.6% of the vectors in the dataset. A few papers
suggested the use of oversampling techniques for treating the imbalanced
minority class (Demidova & Klyueva, 2017; Maldonado & López, 2018);
while other literature suggest undersampling of the majority class (Lin,
Tsai, Hu, & Jhang, 2017); however the discard of useful information could
bias the model and may be useful only on large datasets (Peng & Yao,
2010). Although there are algorithms that classify with high accuracy
in imbalanced datasets, there are algorithms such as decision trees and
random forest that cannot handle such imbalance, therefore the treatment
of such values is relevant.
Figure 1: Check of balance of classes.
Due to the small size of the data set the algorithm chosen for oversampling
the low quality coffee is the Synthetic Minority Oversampling Technique
(SMOTE), it has disadvantages as it does not consider the neighboring
data of other classes and it may create noise, nevertheless the advantages
out-weight the disadvantages.
Finally, the categorical variables: Color (3 different settings) and processing
method (5 different methods). At the end of preprocessing 8 additional
variables were created for a total of 2286 vectors and 26 variables.

The Sweet Maria’s Coffee dataset is contained in a XLS file with differ-
ent tabs containing previous correlation studies, information was at first
extracted from this file to a csv file. As stated before, this dataset has
been used only for the study of the features of the cupping grades as
the study per geography would need a creation of one hot encoding of
more than 36 countries, which does not fit with such small amount of
vectors. This dataset contains only vectors with Specialty coffee, because
the company specializes only on selling specialty coffee; but the intention
of the study is to make an in-depth approach of the information of quality
of coffee; Table 1 in the introduction of this thesis exposes the different
types of sub-classification in the Specialty Quality Coffee and to study the
distinction of Outstanding Specialty coffee the vectors were labeled as 1
= ’Outstanding’ when the score >90 and 0 = ’Excellent and Very Good’
when the score => 90. Finally, the standard scaler of SKLearn was used to
standardize the scores and SMOTE was used to balance the classes. At the
end of preprocessing the dataset had 542 cases and 11 different features.
Figure 2: Check of balance of classes.
3.5 Description of feature importance algorithms
Feature selection and feature importance are connected on the level of

applied techniques, since both approaches may use the same methods,
nonetheless its main objective is different. Feature selection methods help
to build a model with similar and or higher accuracy by choosing features
that can explain a trained model, while feature importance uttermost goal
is to determine which features had the largest magnitude of influence in
the classification task.
There are 3 types of different approaches for feature selection: Filter, Wrap-
per and Embedded methods; each of the methods provide a ranking with
the most influential features between the independent variables and the
dependent variable. Such algorithms allow this project to create different
subsets that will be exposed in the results section; the importance of fea-
tures in coffee classification will be analyzed in the discussion section, but
it is necessary to describe and understand the feature selection methods in
order to analyze the underlying working structure of the subsets, a brief
description can be found in the following sections:
3.5.1 Filter Methods

The filter methods rank the importance of features independent of a ma-
chine learning algorithm. Instead, these methods rely on the characteristics
of the data and the relation through a mathematical algorithm to the out-
come, among them one can find the following algorithms: Chi Squared
test, correlation coefficient, mutual information gain or Mean Absolute
Difference. These methods have the computational speed as its only one
advantage, however the methods do not contemplate for feature redun-
dancy, feature interaction nor feature multicollinearity (Chandrashekar
& Sahin, 2014). For the mentioned reasons, a filter method will be the
baseline model.
In this project the filter method that will be used is the Fisher Score: This is
calculated for each variable in relation to the target, and the main formula
is the following:
j
∑ c n k ( µ k − µ j )2
F ( X j ) = k =1 (1)
( σ j )2
It is mainly used for classification of binary classes, the Fisher Score accord-
ing to the formula is defined by the mean (µ) and the standard deviation σ
of all the vectors for each jth feature (Gu, Li, & Han, 2012); in conclusion it
measures the distance between the means for each class per feature divided
by their variances.
3.5.2 Wrapper Methods

The wrapper algorithms create a subset of the data and train a model on it,
depending on the outcome the model retains certain variables or delete it.
The advantages of the wrapper methods are the accuracy in comparison to
filter methods and the use of the feature interaction for finding the best
predictive subset, however they can be computationally expensive and such
methods are also prone to overfitting (deleting one variable that is highly
correlated to other variable could create a biased model); among them the
most notorious algorithms are: Sequential forward selection, Recursive
backward elimination, Genetic algorithms and Simulated annealing.
The wrapper algorithm chosen for this project is the permutation feature
importance and it is embedded in the SKLearn package. This algorithm is
based on the empirical review of the performance score on a trained model
by randomly shuffling the values of each variable.
The model is considered as agnostic, the reason for such attribution is that
it uses a classifier to understand the feature interaction without assuming
that the model is accurate by the nature of the data; this avoids introducing
a potential bias in the interpretation (Zhou & Hooker, 2021). The logic of
feature permutation is the following:
k =1
1
ij = s −
K ∑ sk j (2)
k
Where i are the iterations over j; which represents each of the features in
the data set and s is the score reached on accuracy by the trained model,
for each feature shuffling of the (k) corrupted data on the feature j; in each
iteration a new score of accuracy is calculated and subtracted to the initial
score; the final model is calculated on each loss on accuracy of each of the
features shuffled and there is where importance is calculated by ranking,
where the feature shuffling prompted in the largest decrease on accuracy.
The feature permutation algorithm has only one main parameter: The clas-
sifier used for determining the accuracy loss per permutation of variable,
in this research the random forest classifier was chosen as the classifier, the
main reason for such election is that it uses subsets to train each tree and
such feature reduces the risk of a biased classification. The random forest
classified with an accuracy of 98% on the test set, therefore the same will
be used for the feature permutation with the following parameters: ’boot-
strap’: True, ’max_depth’: 100, ’max_features’: ’auto’, ’min_samples_leaf’:
2, ’min_samples_split’: 2, ’n_estimators’: 100.
3.5.3 Embedded Methods

Embedded Methods make use of a combination of the wrapper and filter
methods and feature importance is an integral part of the classification
model, many of them use a tree-based classifier, it means that to find
feature importance an impurity model will check for each of the variables
to find the lowest value, it can use either gini or entropy; and the em-
bedded methods can be either a regularization algorithm or have one of
them built-in and it contributes by avoidance of overfitting (Hua, Tembe,
& Dougherty, 2009). Besides the mentioned advantages, due to efficiency
and generalization capabilities embedded methods are considered to be
the standard in feature selection. The eXtreme Gradient Boosting is re-
cent advanced algorithm that can achieve highly accurate predictions and
classifications by means of gradient boosting. It belongs to the feature
selection embedded methods and its algorithm works as a decision-tree-
based ensemble; it was selected for this thesis research due to the novel
use for classification and the build-in regularization penalty it can use. The
XGB is as its name reveal is a boosting algorithm that creates new weak
learners (simple trees that only predict above chance) and sequentially
merges the predictions to improve the general performance of the model.
For any wrong classification, larger weights are assigned to the wrongly
classified subsets and lower values to samples that are correctly classified.
Weak learner models that perform better have higher weights in the final
model (T. Chen & Guestrin, 2016b). Since Boosting has a greedy learning
algorithm, it is necessary to set a stopping step such as a threshold in
the model performance; in that case it is called early stopping, or other
different stopping criteria eg. once a depth of the tree leaves is reached;
the purpose of setting a stopping criterion is to prevent overfitting on the
training data.
L(φ) = ∑(ỹi , yi ) + ∑ Ω( Fk ) (3)
i k
The XGBoost uses an objective function, which has a primary objective, in

the first part there is a loss function that allows a more accurate predictive
model, the residuals are calculated using the predicted value (ŷi ) and the
real value (yi ); those are calculated for each leaf.
1 2
whereΩ( f ) = γT + λ W (4)
2
Omega contains the tree leaves (T) and gamma (γ) is penalty value for
pruning the nodes, after calculating it there is an addition of the fraction
of lambda (λ) and the product with W that represents the leaf weights. In
the case of XGB the pruning of the trees occurs replacing nodes that do
not contribute to improving classification on leaves (C. Chen et al., 2020).
XGBoost splits a tree up to max_depth specified in the parameters and
prunes backwards until the loss is below a threshold, it will only hold the
leafs where the outcome was positive. Among the reasons for traning the
data with XGBoost the following grounds have been enumerated:
• 1. The standard gradient boosting machines do not have a regulariza-

tion, in the case of XGBoost there are two (L1 and L2) parameters for
setting regularization in what is known as boosting regularization.
• 2. It has a build in cross validation, which naively allows for an easier

construction of a model with a small data set without overfitting.
• 3. Speed can be mentioned as one of the advantages of using XGBoost

and training the datasets it took less than minute.
The parameters were searched through a grid search with a stratified

cross validation of 10 folds. Please note that there is an specific build in
feature in the XGB regressor for getting feature importance plotted and
there is as well one different model for XGB Cross Validation, however the
XGBRegressor does not contain a k-fold cross validation, therefore the used
cross validation belongs to the SKLearn package. There are 18 different
parameters for model tuning in XGB: 3 general parameters of the model, 12
different parameters for tuning the booster and 3 different parameters for
the learning task itself, here will be described the settings that prompted
the best results in the grid search:
• max_depth [4] was left on that value as the accuracy was the same
with higher values and a deeper level would over fit.
• gamma [1] Grid search suggested such value, which specifies the
minimum loss reduction required to make a split, the higher the
number the more conservative the model is.
• subsample/colsample_bytree [0.6] was chosen below the default

value as it is partially an standard used by other data scientist and
grid search found it as the best value.
• lambda/Ridge regression [0.5] is the regularization term on weight,
it also allows the algorithm to run faster.
• alpha/Lasso [1.5] was set to this value by grid search suggestion, as
XGB was used due to the embedded regularization capacity, this is
the feature that makes the model more reliable.
3.5.4 Hierarchical Clustering

Unfortunately, the literature suggest that calculating the feature importance
using correlated/colinear features could prompt to a biased understanding
of the features (Ghosh & Ghattas, 2015); this occurs as the correlated
features could work as a proxy to the target when the other feature is
permuted.
The author of this thesis acknowledges that hierarchical clustering is not in
the direct sense of feature importance, but to understand the coffee taste
phenomena a hierarchical clustering was performed; in order to find other
influential features (different to coffee cupping scores) and consequently
help this research to address all the research questions.
Hierarchical clustering analysis works by adjoining different features into
clusters; this method uses a dissimilarity measure in order to cluster certain
features together or leave them apart, this dissimilarity is known as the
linkage method. The linkage method utilizes all the variables and find
commonalities between clusters and join them, in a sense it will force to
create relations with variables, which in the case of coffee cupping will
prompt to variables different to coffee tasting itself. In ScikitLearn there
is an euclidean distance metric method called Ward Linkage, it is similar
to the nearest neighbor approach in the sense that it finds a centroid and
then establishes the distance between clusters, however it is more similar
in the way an ANOVA approach would do by using variance (Odong, van
Heerwaarden, Jansen, van Hintum, & van Eeuwijk, 2011), because it first
needs to find a centroid of each cluster by taking all observations and then
it uses the following statistical algorithms:
• Squared Sum of Errors:
n
SSE = ∑ (yi − f (x1 ))2 (5)
i =1
Where yi is the value of the dependent variable and xi is the value
of the independent variable, while f ( xi ) is the function that predicts
the final value to substract of yi .
• Linkage Function:
D ( X, Y ) = SSE( XY ) − [SSE( X ) + SSE(Y )] (6)
D as dissimilarity and XY is the merging of the clusters, the function is

based on the theory that with minimum variance method two clusters
X and Y can be joined, thus is minimized the sum of squared errors
(SSE); the process is a recursive function that allows clustering a full
dataset. The clustering makes use of any kind of correlation algorithm
for finding the possible features that may predict to each other, in this
thesis Spearman correlation was chosen as it is a standard algorithm.
3.6 Evaluation Algorithm: K Nearest Neighbors
In principle, this thesis project has used other classifiers to analyze the
feature importance of each of the variables in the datasets, XGBoost uses
decision trees to classify; then for the feature permutation and the hierar-
chical clustering another random forest was used for testing the importance
of each chosen and or permuted feature, nonetheless, it is necessary to
test each of the selected features sets compiled across the different feature
importance and selection models. The reason for choosing this algorithm as
the test algorithm of the selected features is mainly that kNN requires only
one hyperparameter to be set to classify. Since the other algorithms should
have chosen the most important features for classification, the algorithm
should work properly for classifying with high accuracy.
One of the disadvantages of the kNN algorithm is that it cannot handle
high dimensional datasets, but the datasets for this thesis were not mas-
sively populated of features and it was shrunk in subsets of the important
features, ergo the disadvantage is not an inconvenient for this research
project.
kNN as mentioned is the lazy classifier used for the model evaluation task,
lazy means that it will not create a model in order to evaluate new cases,
but every case is a new model, it is based on a distance metric and a voting
system (K); the most common approach for distance in the kNN algorithm
is the euclidean distance or also called pythagorean distance, represented
by: s
n
d ( p, q) = ∑ ( q i − p i )2 (7)
i =1
The algorithm checks for the distance between two points; qi and pi , those
are the points that contain the distance of each vector and the origin point
in a n-dimensional plane; traditionally it accounted the distances in a
two dimensional plane, however in practice it can calculate the distance
between two points in any amount of dimensions. For the purpose of the
estimation of distance for this project each dimension is a variable in the
datasets and each sample of coffee will be placed by the algorithm in a
n-dimensional space; after that the test vector will be classified by checking
which are the closest neighbors from the training subset, a neighbor should
be enough for determining a class for a case by determining the smallest
distance between the test vector and the train vector, however a higher
amount of neighbors (k) can be used with the risk of overfitting and it can
also increase the accuracy; the algorithm will use a voting majority system
to decide the label for the case; therefore it is a common logic practice to
use odd numbers for neighbors, in case of a voting there must be a breaking
vote and there will not be ties. There are different parameters that can be
tuned in the kNN algorithm, Bhatia and Vandana (2010) suggested the
use of many different standards for classification; therefore three different
amounts of neighbors will be used k= [3,5,7], while the suggested algorithm
for calculating the distances is called ’Brute’ and it calculates each time
the distance between points, it is not efficient in terms of processing, but it
remains imperceptible due to the size of the data sets used for this thesis.
3.6.1 Subset Evaluation

One of the objectives of this project is to determine if there are variables
different than those of the coffee cupping scores, however due to the
correlation between the scores and the target label, the evaluation will be
done in 3 different ways:
• The complete amount of variables will be evaluated by performing
feature ablation from the most important to the least important.
• An evaluation of a subset that excludes all of the coffee cupping

variables.
• A final evaluation that excludes the variables that include other

variables of the coffee cupping itself: ’Aftertaste’ and ’Balance’ are
containing a description of other coffee cupping score variables.
3.7 Cross Validation
In order to perform an appropriate training of the models a cross validation

was performed in two different ways:
• Train(70%)/Test(30%) Split: Wrapper importance method will use
this data stratification, the classifier used for training the feature
permutation model is a random forest that should generalize by the
own characteristics of the random forest.
• 10-Fold Cross validation: The embedded method will use this data
split; as the XGBoost uses additive trees with optimizers, the use
of all the information for training should avoid bias and find the
features with largest magnitude in the classification.
The filter method does not need any data split, as it does not use any
classifier for finding the correlations among the features and the target, for
such reason all the data should be used.
3.8 Evaluation Metric
The standard of evaluation metrics in Machine Learning relies on the type

of task that is executed; in the case of classification there are for main basic
metrics based on the evaluation of true positives (TP), false positives (FP),
false negatives (FN) and true negatives (TN):
TP + TN
Accuracy = (8)
TP + TN + FP + FN
TP
Precision = (9)
TP + FP
TP
Recall = (10)
TP + FN
2 ∗ Precision ∗ Recall 2 ∗ TP
F1 = = (11)
Precision + Recall 2 ∗ TP + FP + FN
Sensitivity, Specificity and AUC are also vogue metrics that are used
to evaluate a model. For this thesis Accuracy was selected for evaluating
all of the models; it is broadly used and easy to comprehend; its only
disadvantage is the lack of evaluation of a model’s performance when
there are imbalanced classes in a dataframe (Hossin & Sulaiman, 2015),
however this has been sufficed in the preprocessing of the data sets.
3.9 Repository
The data sets and python notebook are stored in GitHub:

Coffee Cupping Feature Importance Repository
4 results 27
4 results
4.1 Specialty Coffee Cupping Most Important Features
The rankings used for evaluation are based on the 10 most important fea-
tures, then feature ablation was performed on each of the subsets. There is
a rationale behind the use of 10 features; specialty coffee quality is already
determined by 10 features; therefore checking for more variables for clas-
sification would go against the intention of scrutinizing the performance
after dimensionality reduction.
The different feature importance methods (filter, wrapper and embedded)
were executed on the datasets for determining if there was a considerable
difference between the rankings created with the different methods.
4.1.1 All-variables set

The Fisher scores ranked features from the CQI dataset, and it placed
among the 10 most important features; features of the coffee cupping itself
(7) and others (3) not used for the specialty coffee quality classification.
The full ranking can be found in Appendix C (page 55).
F-Score XGBoost Feat. Perm.
Uniformity Flavor Flavor

clean cup Aftertaste Uniformity
Aroma cupper points clean cup
altitude low clean cup Aftertaste
Sweetness Balance cupper points
cupper points Uniformity Balance
Semi-washed/pulped Aroma Acidity
Flavor Acidity Aroma
Species Sweetness Body
color green Body cat2 def
Table 6: Top 10 features according to filter, wrapper and embedded methods with
F-Score, XGBoost and Feature Permutation accordingly.
It is important to mention that altitude low, color green and method of

processing are not used for determining coffee as specialty currently; it
proved to be of great interest as it could have helped to address the RQ3,
nonetheless the fisher score was established as a baseline due to the basic
algorithm it uses, fact that was confirmed by its low performance in the
testing phase. The highest accuracy of 84.2% was reached with three
neighbors using the full data set and dropping nine of the most important
variables did not decrease the accuracy abruptly (77.02%); on the other
4 results 28
hand, using the subset of the 10 variables prompted a better accuracy with
an 85.5% of accuracy using three neighbors and dropping each variable in
the ranking when classifying did not exposed consistency in the accuracy;
when dropping altitude high meters the accuracy increased and dropping
flavor decreased a 42.3% of the accuracy, further details can be found in
Appendix C (page 55).
On the other hand, the XGBoost algorithm found that the most important
10 variables are variables that belong only to the coffee cupping scores;
it does not match any of the placed positions by the fisher scores feature
importance.
According to the scores of accuracy in the evaluation of the subset built
with the XGBoost algorithm the use of the whole data set of 26 variables
creates a lot of noise that cannot be handled by the kNN algorithm, when
using the top 10 features and five neighbors the accuracy was a 99.56%
on the test set and dropping eight of the most important variables proved
an steady decrease of the metrics to an 83.4%, however when running the
algorithm backwards, dropping the least important to the most important
exposed the importance of "clean cup", "aftertaste" and "flavor".
Figure 3: Accuracy decrease when discarding the 8 of 10 important features from

least important to most important.
The figure 3 show that after dropping 7 variables the accuracy does not
change by ablating Clean cup and Cupper points and aftertaste. Aftertaste
and flavor hold an 88% of the accuracy reached by the kNN. While using
4 results 29
Body and Sweetness hold for an 86% of the accuracy. It exposes that the
performance of kNN for classification using the 9th and the 10th important
features yield a result of just 2% less than the use of flavor and aftertaste.
Figure 4: Feature importance figure created with XGBoost
Nonetheless, it is important to highlight that when evaluating the

subset created with the fisher scores the ablation of flavor accounted for
42.3% decrease in the accuracy of the model and it confirms partially the
importance of flavor as exposed by XGBoost, that accounts partially for
more than 40% of the accuracy lost by the classifier.
Figure 5: Difference between impurity importance and permutation based impor-

tance.
4 results 30
Above can be seen the accuracy adhered to each of the features ac-
cording to the classifier used for feature permutation importance and the
permutation importance itself. The random forest trained for the feature
permutation importance yield a similar result of feature importance to the
XGBoost subset; however the feature importance based on impurity models
differs greatly in comparison to the other algorithm used for feature im-
portance; both algorithms XGBoost and Feature Permutation Importance
placed flavor most important feature for specialty coffee quality classifi-
cation. Despite the commonality on the first feature, the other features
varied without an established pattern in the different models; hitherto
is the XGBooster that prompted to a better ranking of the features: The
feature permutation importance algorithm selected a feature that does not
belong to the coffee cupping scores as important; the category 2 defects, it
could help to answer one of the research questions, however when testing
the subset using 10 variables it performed on average with a smaller accu-
racy (97.96%) than the subset chosen by the XGBoost feature importance
(98.09%). Besides, the performance of the classifier used for evaluation had
a steady decline in the accuracy using the XGBoost subset, while the subset
created by the feature permutation importance had an increase of the
performance when dropping different variables from the least important
to the most important. Further information with accuracy scores for spe-
cialty coffee of the XGBooster feature importance and Feature Permutation
Importance can be found in the Appendix D (page 56) and Appendix E
(page 58) respectively.
4.1.2 Non-cupping scores subset
Feat. Perm XGBoost
cat2 def cat2def

Moisture Moisture
altitude high Species
altitude low Natural / Dry
Natural / Dry altitude low
cat1 def color Green
Washed / Wet Washed / Wet
Semi-washed/pulped altitude high
quakers cat1 def
method Other method Other
Table 7: Ranking of top 10 feature importance for specialty coffee excluding the
coffee cupping scores
4 results 31
The subsets are similar in the rankings of the 10 most important features.
The specifics of the evaluation for both subsets can be found in tables
18 (XGBoost) and 23 (Feature Permutation). The model with all of the
variables of the subset had a lower accuracy in comparison to the model
with coffee cupping scores (XGBoost subset: 76.42%, Feature Permutation
subset:75.69%),the ’category type 2 of defects’ and ’moisture’ are the most
important variables for both models; only sharing the 7th (’Processing
Method:Washed/Wet’) and 10th (’Processing Method: Other’) position in
the ranking. The remaining positions of the rankings hold no consistency
in the order among the two subsets.
4.1.3 Non-proxy scores subset
Feat. Perm XGBoost

Flavor Flavor
Uniformity cupper_points
clean_cup clean_cup
cupper_points Uniformity
Acidity Aroma
Aroma Acidity
Body Sweetness
cat2_def Body
Sweetness cat2_def
Moisture Moisture
Table 8: Ranking of top 10 feature importance for specialty coffee excluding the
coffee cupping scores
The two proxy variables that contain other variables are ’Aftertaste’ and
’Balance’; those were excluded in order to make an evaluation of the
subsets. Both of the subsets contain the same 10 variables 8 variables
include coffee cupping scores, while other 2 are extrinsic factors of the
coffee bean: ’Moisture’ and ’Category type 2 defects’. The evaluation of
the subsets prompted in both cases for a lower accuracy, tables 19 and 24
expose a loss in 1% of the accuracy in both cases (XGBoost subset: 98.40%,
Feature Permutation subset:98.18%).
4.2 Outstanding specialty coffee classification
The analysis of Sweet Maria’s Coffee data set also used all of the methods
described in the previous section, however no extrinsic variables were taken
into account. The feature permutation and XGBoost feature importance
4 results 32
suggested that the importance of variables in outstanding specialty quality

coffee are in the following order:
XGBoost Importance Feature Permutation Position

Flavor Flavor 1
Fragrance Finish 2
Aroma Complexity 3
Finish Fragrance 4
Complexity Clean Cup 5
Sweetness Uniformity 6
Uniformity Brightness 7
Brightness Body 8
Clean_cup Cuppers Correction 9
Cuppers_correction Sweetness 10
Body Aroma 11
Table 9: Ranking of feature importance for outstanding specialty coffee
While all of the rankings show that flavor is the most important feature
for specialty coffee, there was not a clear distinction of all of the most
important variables as there are no established patterns in common. In the
evaluation of the rankings with the kNN algorithm; the feature ablation
with the respective order of the rankings from the least important to the
most important and rearwards proved that the feature order chosen by the
XGBoost had a better performance than the feature permutation algorithm;
while the feature ablation of 9 variables from the most important to the
least important showed a decline on average in the accuracy of a 43.12%,
the decrease of accuracy when dropping the most important to the least
important only showed a decrease in the accuracy of 2.14%; showing that
even the last two variables hold an 87.15% of the accuracy, the complete
data is available in Appendix (page 61).
4 results 33
Figure 6: Unstable decrease with feature ablation of the ranking created with
feature permutation importance.
On the other hand, the feature permutation importance subset showed a

different decline when dropping the most important to the least important
variables, as it can be seen in the figure the feature ablation did not
decrease the accuracy steadily, but it actually increased when dropping the
third (Finish) and fourth most important variables (Complexity), further
information can be seen in Appendix H (page 62); for the mentioned facts
the feature importance ranking build with the XGBoost is considered as
the final for classification of outstanding specialty coffee.
4.3 Hierarchical Clustering for Coffee Classification
As both of the previous sections only considered the coffee cupping fea-
tures, the hierarchical clustering aimed to determine coffee quality by
adding other extrinsic variables.
4 results 34
Figure 7: Hierarchical clustering figure of Coffee Quality Dataset.
The wards linkage algorithm found 6 macro-clusters on the second

level and out of them two sets of variables were arbitrarily created: the
first item from each cluster and the second from each cluster:
Subset 1 Subset 2
method_Natural / Dry method_Other
method_Washed / Wet color_Green
Aroma Flavor
Uniformity clean_cup
Moisture cat1_def
altitude_low_meters altitude_high_meters
Table 10: Subset of features created with hierarchical clustering

4 results 35
The random forest classifier scored with a 95% of accuracy on the test
set using all the variables of the subsets, however when the subsets were
evaluated with kNN the performance with the full subset was lower for
each of them, with an 81.73% of accuracy of subset two and a 78.03% in
the subset one. A feature ablation to each of the subsets deemed a larger
accuracy when dropping features arbitrarily.
Figure 8: Hierarchical clustering subset 1 evaluation with kNN.
The intention of the feature ablation was not to evaluate the most impor-
tant feature of the subset, but instead testing in a different environment the
selected features. An increase in the model performance can be evidenced
in the graphic above when the "altitude high meters" and the "category 1
defects" were dropped for the kNN classification.
5 discussion 36
5 discussion
The results showed that specialty coffee classification can be highly biased
in different aspects, the coffee taste prints should be reflected in the coffee
cupping as many scientist have exposed previously; altitude, moisture or
processing method should influence the coffee taste as exposed by (Bravo-
Moncayo et al., 2020; J. Li et al., 2019; Spence & Carvalho, 2019), however
none of it was reflected in the coffee feature importance. In the first place, it
can be explained by the way the label quality is created; a linear model uses
the sum of all cupping scores, this sum makes up to a final score between
1-100 or 1-110 and it determines the type of specialty coffee evaluated. In
second place, there is a strong impact of the symbolic influence in the
coffee rating; the trademark and geographic origin of the coffee could bias
the coffee evaluation; the research of (Traore et al., 2018) proved that not
only tasting, but also pricing relies on the symbolic assumptions given
to the coffee, eg. a robusta coffee from Ethiopia or an arabic coffee from
Colombia will be better rated and priced due to the symbolic assumptions,
than a coffee harvested in Morocco.
The validity of the coffee cupping scores in the international trade market
should be reconsidered, as those scores may represent the preference of
the q-cup grader and not reflect the actual objective features of coffee as
mentioned above, a deeper explanation of the assumptions can be found
in the following paragraphs:
5.1 RQ1
5.1.1 Specialty Coffee Feature Importance
Features Position
Flavor 1
Aftertaste 2
cupper points 3
clean cup 4
Balance 5
Uniformity 6
Aroma 7
Acidity 8
Sweetness 9
Body 10
Table 11: Specialty Coffee classification most important features in descending

order. It contains the variables identified by the XGBoost algorithm.
5 discussion 37
The first research question was addressed with different algorithms;

the features that influence the most the coffee classification were found
using the XGBoost algorithm as it offered a more reliable and consistent
decrease of the accuracy when performing the feature ablation from the
most important to the least important and backwards; the feature permuta-
tion confirmed the information given by the first model by placing flavor
as the most important variable for all types of specialty coffee.
The other features that influence the most the specialty coffee classification
are Aftertaste and Cupper Points, the second most important feature goes
in line with the first as Aftertaste is "the length of positive flavor (taste and
aroma) qualities emanating from the back of the palate" (Specialty Coffee
Association of America, 2015, p. 3); classification scores of the test set
indicated that Cupper Points have a high influence for the coffee cupping
and it is indeed an issue as it seems that the personal preference of taste
of the professional grader plays a huge role in the coffee classification
and it confirms the research done by Carvalho, Moksunova, and Spence
(2020); Carvalho and Spence (2018), as it says that the coffee tasting can be
greatly influenced by the environment and instruments used for the coffee
cupping; nonetheless, when performing feature ablation from the least
important feature to the most important the ’Clean Cup’ feature ablation
held the same accuracy after dropping ’Cupper Points’, it could suggest
that one variable could work as a proxy for the other; it was expected that
’cupper points’ worked as a proxy for other variables as it per se is an
abstract of all cupping scores.
Then balance was also expected to work as a proxy for other variables,
as it is the balance between ’flavor’, ’aftertaste’, ’acidity’ and ’body’; nev-
ertheless it is remarkable that ’sweetness’ and ’body’ could still be used
for classification with an 83% of accuracy, while the two most important
features can be used for a classification of an 88% of accuracy, it proves
that the mentioned variables are indeed the most important; nevertheless
the small difference between the two most important variables to the 9th
and 10th variables cannot be explained with the kNN algorithm; as coffee
samples with similar specialty qualities may be placed close to each other
by its neighbours; perhaps the use of a linear classifier such as SVM could
explain better the differences between each other by placing them in a
hyperplane, but the results so far show that by exception of ’aftertaste’
and ’balance’ all features of coffee play a major role in the coffee specialty
classification with the current evaluation methods.
5 discussion 38
5.1.2 Outstanding Specialty Coffee Feature Importance
Outstanding Specialty Coffee Position

Flavor 1
Fragrance 2
Wet Aroma 3
Finish 4
Complexity 5
Sweetness 6
Uniformity 7
Brightness 8
Clean cup 9
Cuppers correction 10
Body 11
Table 12: Ranking of feature importance for outstanding specialty coffee
On the other side, the classification of outstanding coffee quality differs

with the first variables as the data set is different not only from the features
presented but also by the evaluation method performed by Sweet Maria’s
Coffee Inc. As it happens with very good specialty coffee and excellent
specialty coffee has the same important variable for outstanding specialty
coffee is flavor, however there is a main difference between the two data
sets as the evaluation method of Sweet Maria’s seems to break into more
variables the coffee cupping and the method assigns scores to variables that
seem specific to each trait of coffee; however the SCAA method evaluates
different features in one constrained feature, balance is one of them, as 4
other variables are contained in the criteria of the scoring, or cupper points
that contains 9 variables in it.
The classification of the feature permutation importance vs the XGBoost
feature importance showed that XGBoost is more reliable in both cases,
literature says that it could happen when variables are highly correlated,
which is the case of the coffee cupping variables, as seen in the Appendix
A (page 48). Therefore, the use of XGBoost was accurate in the sense
that regularization, stratified cross validation and the use of multiple like-
decision trees created a subset with higher accuracy when evaluated with
kNN.
The labeling of coffee cupping for outstanding specialty coffee does not
differ with excellent and very good specialty coffee in the sense that flavor
is still the the variable that influences the most in the classification. The
coffee evaluation of Sweet Maria’s has a low influence from the coffee
grader personal evaluation and as standard specialty coffee the most
important variables are primary taste like sensations and then texture-like
impressions.
5 discussion 39
5.2 RQ2
As for the second research question, the results confirm that classification
with less variables of the data set prompted to a smaller accuracy; that
applies when using only the coffee cupping scores as the feature ablation
even by discarding the least important features showed a decrease in
the performance; however the feasibility of creating a coffee evaluation
method that evaluates coffee not only with the scores, but also other
variables was checked directly with two approaches; the use of kNN
algorithm with of all the variables (including extrinsic information) showed
an 84% of accuracy. A basic random forest (used as classifier in the feature
permutation) outperformed the KNN and it reached a 98% on the test set
using all of the variables; this outcome reflects the phenomena exposed
in the scientific literature: kNN is very sensible to noise and it does not
perform well with high dimensional data (Kouiroukidis & Evangelidis,
2011). However, none of the described previous data reached a 99% of
accuracy with only the cupping scores.
As exposed in the tables 18 and 23 in the Appendix section, the subset
models without coffee proxy (’aftertaste’ and ’balance’) variables had
a lower performance in comparison to the model with cupping scores,
however its difference was minimal and it promotes the use of feature
selection; due to the nature of correlated features in the SCA coffee cupping
method, the feature selection could delete redundancy; in this case the
redundancy is not directly reflected in the accuracy scores, however the
evaluation method should reevaluate if variables such as ’aftertaste’ and
’balance’ should be still included in the coffee cupping scores, as more
variables that rely on human perception could bias to a deeper level one
coffee evaluation.
5.3 RQ3
Finally, the third research question was addressed with the hierarchical
clustering and Ward’s linkage; the features subsets performed with a maxi-
mal of 81% of accuracy with the kNN, however the ablation of the extrinsic
variables the accuracy was increased, perhaps the random forest used for
classification initially prompted to overfit, as the use of impurity-based
algorithms such as the random forest can be biased when having high
cardinality features as exposed by Zhou and Hooker (2021). The use of
the XGBoost and the feature permutation importance proved that other
variables such as categorical type one and two defects, moisture and al-
titude of the crop could contribute in the model; in fact, the subset that
excluded the coffee cupping scores performed up to a 76.42% of accuracy,
5 discussion 40
it makes a difference of only 6.55% of difference with the baseline (F-Score:

82.97%), however it is the model that performs with the lowest accuracy in
comparison to the rest of the models.
In conclusion, the prices of coffee are currently influenced by an offer

and demand market, the demand in the market is determined by the
scores given by Q-Cup graders that may print their personal preferences
in the evaluation method. The price of the coffee cupping may pose as
another obstacle for the third country coffee producers and it could not
contribute in the long term for a sell-out in the international market; in fact,
recent investigations have evidenced that there is a discrepancy between
the scores created by Q-Cup graders and the final coffee processors (Worku,
Duchateau, & Boeckx, 2016); therefore future studies should focus in the
use of objective data for coffee quality classification; however the lack of
massive data for coffee quality evaluation is a current limitation, not only
for general evaluation, but also lack of big data may disqualify any type of
deep learning classification of quality.
conclusion
Coffee quality is measured between 10 and 11 features for the available

data sets in this study and it was practical to determine the impact of
the variables that influence the most the specialty coffee classification; the
personal judgement plays a big role when classifying a vector. Nonetheless,
this thesis aimed to determine the possibility of finding extrinsic features
that could contribute to the coffee quality classification; none of them
could contribute to a model with a better accuracy than the baseline; the
model with only the cupping scores was necessary for reaching the 99%
accuracy. Future studies should focus on the use of standardization of the
instruments in the evaluation of the extrinsic variables and the use of neural
networks for reducing the bias of the human impact in the classification
of coffee quality without discarding the hedonic judgement in the coffee
market.
REFERENCES 41
references
Aich, S., Al-Absi, A. A., Lee Hui, K., & Sain, M. (2019). Prediction of
quality for different type of wine based on different feature sets using
supervised machine learning techniques. In 2019 21st international
conference on advanced communication technology (icact) (p. 1122-1127).
doi: 10.23919/ICACT.2019.8702017
Bertrand, B., Vaast, P., Alpizar, E., Etienne, H., Davrieux, F., & Charmetant,
P. (2006, sep). Comparison of bean biochemical composition and
beverage quality of Arabica hybrids involving Sudanese-Ethiopian
origins with traditional varieties at various elevations in Central
America. Tree Physiology, 26(9), 1239–1248. doi: 10.1093/treephys/
26.9.1239
Bhatia, N., & Vandana. (2010, July). Survey of Nearest Neighbor Techniques.
arXiv e-prints, arXiv:1007.0085.
Bravo-Moncayo, L., Reinoso-Carvalho, F., & Velasco, C. (2020). The effects
of noise control in coffee tasting experiences. Food Quality and Pref-
erence, 86, 104020. Retrieved from https://www.sciencedirect.com/
science/article/pii/S0950329320302895 doi: https://doi.org/
10.1016/j.foodqual.2020.104020
Buitinck, L., Louppe, G., Blondel, M., Pedregosa, F., Mueller, A., Grisel, O.,
. . . Varoquaux, G. (2013). API design for machine learning software:
experiences from the scikit-learn project. In Ecml pkdd workshop:
Languages for data mining and machine learning (pp. 108–122).
Carvalho, F. M., Moksunova, V., & Spence, C. (2020). Cup tex-
ture influences taste and tactile judgments in the evaluation of
specialty coffee. Food Quality and Preference, 81, 103841. Re-
trieved from https://www.sciencedirect.com/science/article/
pii/S0950329319306652 doi: https://doi.org/10.1016/j.foodqual
.2019.103841
Carvalho, F. M., & Spence, C. (2018). The shape of the cup influences aroma,
taste, and hedonic judgements of specialty coffee. Food Quality and
Preference, 68, 315-321. Retrieved from https://www.sciencedirect
.com/science/article/pii/S0950329318300855 doi: https://doi
.org/10.1016/j.foodqual.2018.04.003
Chambers, E., & Koppel, K. (2013). Associations of volatile compounds with
sensory aroma and flavor: The complex nature of flavor. Molecules,
18(5), 4887–4905. Retrieved from https://www.mdpi.com/1420-3049/
18/5/4887 doi: 10.3390/molecules18054887
Chandrashekar, G., & Sahin, F. (2014). A survey on feature se-
lection methods. Computers & Electrical Engineering, 40(1), 16–
28. Retrieved from https://www.sciencedirect.com/science/
REFERENCES 42
article/pii/S0045790613003066 doi: https://doi.org/10.1016/

j.compeleceng.2013.11.024
Chang, Y.-T., Hsueh, M.-C., Hung, S.-P., Lu, J.-M., Peng, J.-H., & Chen, S.-F.
(2021). Prediction of specialty coffee flavors based on near-infrared
spectra using machine- and deep-learning methods. Journal of the
Science of Food and Agriculture, n/a.
Chen, C., Zhang, Q., Yu, B., Yu, Z., Lawrence, P. J., Ma, Q., &
Zhang, Y. (2020). Improving protein-protein interactions pre-
diction accuracy using XGBoost feature selection and stacked
ensemble classifier. Computers in Biology and Medicine, 123,
103899. Retrieved from https://www.sciencedirect.com/science/
article/pii/S0010482520302481 doi: https://doi.org/10.1016/
j.compbiomed.2020.103899
Chen, T., & Guestrin, C. (2016a). XGBoost: A scalable tree boosting system.
In Proceedings of the 22nd acm sigkdd international conference on knowledge
discovery and data mining (pp. 785–794). New York, NY, USA: ACM.
Retrieved from http://doi.acm.org/10.1145/2939672.2939785 doi:
10.1145/2939672.2939785
Chen, T., & Guestrin, C. (2016b). Xgboost: A scalable tree boosting system.
In Proceedings of the 22nd acm sigkdd international conference on knowledge
discovery and data mining (p. 785–794). New York, NY, USA: Associ-
ation for Computing Machinery. Retrieved from https://doi-org
.tilburguniversity.idm.oclc.org/10.1145/2939672.2939785 doi:
10.1145/2939672.2939785
Conley, J., & Wilson, B. (2020). Coffee terroir: cupping description profiles
and their impact upon prices in Central American coffees. GeoJournal,
85(1), 67–79. Retrieved from https://doi.org/10.1007/s10708-018
-9949-1 doi: 10.1007/s10708-018-9949-1
Demidova, L., & Klyueva, I. (2017). SVM classification: Optimization with
the SMOTE algorithm for the class imbalance problem. In 2017 6th
mediterranean conference on embedded computing (meco) (pp. 1–4). doi:
10.1109/MECO.2017.7977136
Donnet, M. L., & Weatherspoon, D. D. (2006). Effect of Sensory and Rep-
utation Quality Attributes on Specialty Coffee Prices [2006 Annual
meeting, July 23-26, Long Beach, CA].
doi: 10.22004/ag.econ.21388
Fernández, A., García, S., Herrera, F., & Chawla, N. (2018). Smote for
learning from imbalanced data: Progress and challenges, marking
the 15-year anniversary. J. Artif. Intell. Res., 61, 863-905.
Ghosh, J., & Ghattas, A. E. (2015, jul). Bayesian Variable Selection Under
Collinearity. The American Statistician, 69(3), 165–173. Retrieved from
https://doi.org/10.1080/00031305.2015.1031827 doi: 10.1080/
REFERENCES 43
00031305.2015.1031827
Ginsburg, S. B., Lee, G., Ali, S., & Madabhushi, A. (2016). Feature im-
portance in nonlinear embeddings (fine): Applications in digital
pathology. IEEE Transactions on Medical Imaging, 35(1), 76-88. doi:
10.1109/TMI.2015.2456188
Gu, Q., Li, Z., & Han, J. (2012). Generalized fisher score for feature selection.
arXiv preprint arXiv:1202.3725.
Gupta, Y. (2018). Selection of important features and predicting wine qual-
ity using machine learning techniques. Procedia Computer Science, 125,
305-312. Retrieved from https://www.sciencedirect.com/science/
article/pii/S1877050917328053 (The 6th International Conference
on Smart Computing and Communications) doi: https://doi.org/
10.1016/j.procs.2017.12.041
Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P.,
Cournapeau, D., . . . Oliphant, T. E. (2020). Array programming with
NumPy. Nature, 585(7825), 357–362. Retrieved from https://doi
.org/10.1038/s41586-020-2649-2 doi: 10.1038/s41586-020-2649-2
Hossin, M., & Sulaiman, M. (2015). A review on evaluation metrics for
data classification evaluations. International Journal of Data Mining &
Knowledge Management Process, 5(2), 1.
Hu, G., Xi, T., Mohammed, F., & Miao, H. (2016). Classification of wine
quality with imbalanced data. In 2016 ieee international conference
on industrial technology (icit) (p. 1712-1217). doi: 10.1109/ICIT.2016
.7475021
Hua, J., Tembe, W. D., & Dougherty, E. R. (2009). Perfor-
mance of feature-selection methods in the classification of high-
dimension data. Pattern Recognition, 42(3), 409–424. Re-
pii/S0031320308003142 doi: https://doi.org/10.1016/j.patcog.2008
.08.001
Hunter, J. D. (2007). Matplotlib: A 2d graphics environment. Computing in
Science & Engineering, 9(3), 90–95. doi: 10.1109/MCSE.2007.55
International Coffee Organization. (2021, March). World coffee con-
sumption. https://www.ico.org/prices/new-consumption-table
.pdf. Retrieved April 7, 2021, from https://www.ico.org/prices/
new-consumption-table.pdf (Last checked on April 09, 2021)
Kluyver, T., Ragan-Kelley, B., Pérez, F., Granger, B., Bussonnier, M., Frederic,
J., . . . Willing, C. (2016). Jupyter notebooks – a publishing format for
reproducible computational workflows. In F. Loizides & B. Schmidt
(Eds.), Positioning and power in academic publishing: Players, agents and
agendas (p. 87 - 90).
REFERENCES 44
Kouiroukidis, N., & Evangelidis, G. (2011). The effects of dimensional-

ity curse in high dimensional knn search. In 2011 15th panhellenic
conference on informatics (p. 41-45). doi: 10.1109/PCI.2011.45
Lannigan, J. (2020). Making a space for taste: Context and discourse in
the specialty coffee scene. International Journal of Information Manage-
ment, 51, 101987. Retrieved from https://www.sciencedirect.com/
science/article/pii/S0268401218313744 doi: https://doi.org/
10.1016/j.ijinfomgt.2019.07.013
Lashermes, P., Andrade, A. C., & Etienne, H. (2008). Genomics of coffee
one of the world’s largest traded commodities. In P. H. Moore &
R. Ming (Eds.), Genomics of tropical crop plants (pp. 203–226). New
York, NY: Springer New York. Retrieved from https://doi.org/
10.1007/978-0-387-71219-2_9 doi: 10.1007/978-0-387-71219-2_9
Laughter, A., & Omari, S. (2020). A study of modeling techniques for
prediction of wine quality. In K. Arai, S. Kapoor, & R. Bhatia (Eds.),
Intelligent computing (pp. 373–399). Cham: Springer International
Publishing.
LeDoux, J. (2018). Coffee quality institute database. https://github.com/
jldbc/coffee-quality-database. GitHub.
Lemaître, G., Nogueira, F., & Aridas, C. K. (2017, January). Imbalanced-
learn: A python toolbox to tackle the curse of imbalanced datasets in
machine learning. J. Mach. Learn. Res., 18(1), 559–563.
Li, J., Streletskaya, N. A., & Gómez, M. I. (2019). Does taste sensitivity
matter? The effect of coffee sensory tasting information and taste
sensitivity on consumer preferences. Food Quality and Preference, 71,
447–451. Retrieved from https://www.sciencedirect.com/science/
article/pii/S0950329318302878 doi: https://doi.org/10.1016/j
.foodqual.2018.08.006
Li, Z., & Suslick, K. (2017, 12). A hand-held optoelectronic nose for the
identification of liquors. ACS Sensors, 3. doi: 10.1021/acssensors
.7b00709
Lin, W.-C., Tsai, C.-F., Hu, Y.-H., & Jhang, J.-S. (2017). Clustering-based
undersampling in class-imbalanced data. Information Sciences, 409-410,
17–26. Retrieved from https://www.sciencedirect.com/science/
article/pii/S0020025517307235 doi: https://doi.org/10.1016/j
.ins.2017.05.008
Lingle, T. R., & Menon, S. N. (2017). Chapter 8 - Cupping and Grad-
ing—Discovering Character and Quality. In B. B. T. T. C. Folmer
& S. of Coffee (Eds.), (pp. 181–203). Academic Press. Re-
pii/B9780128035207000086 doi: https://doi.org/10.1016/B978-0
-12-803520-7.00008-6
REFERENCES 45
Maldonado, S., & López, J. (2018). Dealing with high-dimensional

class-imbalanced datasets: Embedded feature selection for
SVM classification. Applied Soft Computing, 67, 94–105. Re-
pii/S1568494618301108 doi: https://doi.org/10.1016/j.asoc.2018
.02.051
Mazzafera, P. (1999). Chemical composition of defective coffee
beans. Food Chemistry, 64(4), 547-554. Retrieved from https://
www.sciencedirect.com/science/article/pii/S0308814698001678
doi: https://doi.org/10.1016/S0308-8146(98)00167-8
McKeon, R. (2020). Sweet maria’s coffee dataset. https://
towardsdatascience.com/analyzing-sweet-marias-coffee
-cupping-metrics-3be460884bb1. Towards Data Science.
McKinney, W., et al. (2010). Data structures for statistical computing in
python. In Proceedings of the 9th python in science conference (Vol. 445,
pp. 51–56).
Michel, V., Eger, E., Keribin, C., & Thirion, B. (2011). Mul-
ticlass Sparse Bayesian Regression for fMRI-Based Prediction.
International journal of biomedical imaging, 2011, 350838. Re-
trieved from https://pubmed.ncbi.nlm.nih.gov/21754916https://
www.ncbi.nlm.nih.gov/pmc/articles/PMC3132985/ doi: 10.1155/
2011/350838
Mussatto, S. I., Machado, E. M. S., Martins, S., & Teixeira, J. A. (2011).
Production, Composition, and Application of Coffee and Its Industrial
Residues. Food and Bioprocess Technology, 4(5), 661. Retrieved from
https://doi.org/10.1007/s11947-011-0565-z doi: 10.1007/s11947
-011-0565-z
Odong, T. L., van Heerwaarden, J., Jansen, J., van Hintum, T. J. L., &
van Eeuwijk, F. A. (2011). Determination of genetic structure of
germplasm collections: are traditional hierarchical clustering meth-
ods appropriate for molecular marker data? Theoretical and Applied
Genetics, 123(2), 195–205. Retrieved from https://doi.org/10.1007/
s00122-011-1576-x doi: 10.1007/s00122-011-1576-x
Peng, Y., & Yao, J. (2010). Adaouboost: Adaptive over-sampling and under-
sampling to boost the concept learning in large scale imbalanced
data sets. In Proceedings of the international conference on multime-
dia information retrieval (p. 111–118). New York, NY, USA: Associ-
ation for Computing Machinery. Retrieved from https://doi-org
.tilburguniversity.idm.oclc.org/10.1145/1743384.1743408 doi:
10.1145/1743384.1743408
Reis Guimarães, E., Carlos dos Santos, A., Montagnana Vicente Leme,
P. H., & da Silva Azevedo, A. (2020, sep). Direct trade in the
REFERENCES 46
specialty coffee market: Contributions, limitations and new lines

of research. In (Vol. 15, pp. 34–62). Associacao Escola Superior
de Propaganda e Marketing. Retrieved from http://10.0.72.136/
internext.v15i3.588
Rodríguez, J., Durán, C., & Reyes, A. (2010). Electronic nose for
quality control of Colombian coffee through the detection of de-
fects in "Cup Tests". Sensors (Basel, Switzerland), 10(1), 36–46. Re-
trieved from https://pubmed.ncbi.nlm.nih.gov/22315525https://
www.ncbi.nlm.nih.gov/pmc/articles/PMC3270826/ doi: 10.3390/
s100100036
Specialty Coffee Association of America. (2015). SCAA cupping pro-
tocols. https://www.scaa.org/PDF/resources/cupping-protocols
.pdf. Specialty Coffee Association of America.
Spence, C., & Carvalho, F. M. (2019). Assessing the influence of the
coffee cup on the multisensory tasting experience. Food Quality and
Preference, 75, 239-248. Retrieved from https://www.sciencedirect
.com/science/article/pii/S095032931831036X doi: https://doi
.org/10.1016/j.foodqual.2019.03.005
Suarez-Peña, J. A., Lobaton-García, H. F., Rodríguez-Molano, J. I., &
Rodriguez-Vazquez, W. C. (2020). Machine learning for cup cof-
fee quality prediction from green and roasted coffee beans features.
In J. C. Figueroa-García, F. S. Garay-Rairán, G. J. Hernández-Pérez, &
Y. Díaz-Gutierrez (Eds.), Applied computer sciences in engineering (pp.
48–59). Cham: Springer International Publishing.
Sweet Maria’s Coffee Inc. (2021, March). A primer to the charts and
numbers of our coffee cupping profiles. https://library.sweetmarias
.com/understanding-our-coffee-reviews/. Retrieved April 7,
2021, from https://library.sweetmarias.com/understanding-our
-coffee-reviews/ (Last checked on April 09, 2021)
Talwar, D., Mongia, A., Sengupta, D., & Majumdar, A. (2018, 11). Au-
toimpute: Autoencoder based imputation of single-cell rna-seq data.
Scientific Reports, 8. doi: 10.1038/s41598-018-34688-x
Traore, T. M., Wilson, N. L. W., & Fields, I., D. (2018). What explains
specialty coffee quality scores and prices: a case study from the cup
of excellence program. Journal of Agricultural and Applied Economics,
50(3), 349-368.
Valko, M., & Hauskrecht, M. (2010). Feature importance anal-
ysis for patient management decisions. Studies in health
technology and informatics, 160(Pt 2), 861–865. Retrieved
from https://pubmed.ncbi.nlm.nih.gov/20841808https://
www.ncbi.nlm.nih.gov/pmc/articles/PMC3058588/
REFERENCES 47
Waskom, M., Botvinnik, O., O’Kane, D., Hobson, P., Lukauskas, S., Gemper-
line, D. C., . . . Qalieh, A. (2017, September). seaborn: v0.8.1 (septem-
ber 2017). Zenodo. Retrieved from https://doi.org/10.5281/
zenodo.883859 doi: 10.5281/zenodo.883859
Wikström, N., Kainulainen, K., Razafimandimbison, S. G., Smedmark,
J. E. E., & Bremer, B. (2015, may). A Revised Time Tree of the Asterids:
Establishing a Temporal Framework For Evolutionary Studies of the
Coffee Family (Rubiaceae). PLOS ONE, 10(5), e0126690. Retrieved
from https://doi.org/10.1371/journal.pone.0126690
Wollni, M., & Zeller, M. (2007). Do farmers benefit from participating
in specialty markets and cooperatives? the case of coffee marketing
in costa rica. Agricultural Economics, 37(2-3), 243-248. Retrieved
from https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1574
-0862.2007.00270.x doi: https://doi.org/10.1111/j.1574-0862.2007
.00270.x
Worku, M., Duchateau, L., & Boeckx, P. (2016, oct). Reproducibility of coffee
quality cupping scores delivered by cupping centers in Ethiopia.
Journal of Sensory Studies, 31(5), 423–429. Retrieved from https://doi
.org/10.1111/joss.12226 doi: https://doi.org/10.1111/joss.12226
Yuriko, C., & I Dewa, M. B. A. D. (2020). Specialty coffee cupping score
prediction with general regression neural network (grnn). JELIKU (Ju-
rnal Elektronik Ilmu Komputer Udayana), 9(2), 185–190. Retrieved from
https://ojs.unud.ac.id/index.php/JLK/article/view/64485
Zhou, Z., & Hooker, G. (2021, 01). Unbiased measurement of feature
importance in tree-based methods. ACM Transactions on Knowledge
Discovery from Data, 15, 1-21. doi: 10.1145/3429445
6 appendix 48
6 appendix
appendix a
Coffee Cupping Features Description

Description of Specialty Coffee Association of America (2015) cupping
variables, textually extracted from their website:
• Fragrance/Aroma | The aromatic aspects include Fragrance (defined

as the smell of the ground coffee when still dry) and Aroma (the
smell of the coffee when infused with hot water). One can evaluate
this at three distinct steps in the cupping process: (1) sniffing the
grounds placed into the cup before pouring water onto the coffee;
(2) sniffing the aromas released while breaking the crust; and (3)
sniffing the aromas released as the coffee steeps. Specific aromas can
be noted under “qualities” and the intensity of the dry, break, and
wet aroma aspects noted on the 5-point vertical scales. The score
finally given should reflect the preference of all three aspects of a
sample’s Fragrance/Aroma.
• Flavor | Flavor represents the coffee’s principal character, the "mid-

range" notes, in between the first impressions given by the coffee’s
first aroma and acidity to its final aftertaste. It is a combined impres-
sion of all the gustatory (taste bud) sensations and retro-nasal aromas
that go from the mouth to nose. The score given for Flavor should
account for the intensity, quality and complexity of its combined taste
and aroma, experienced when the coffee is slurped into the mouth
vigorously so as to involve the entire palate in the evaluation.
• Aftertaste | Aftertaste is defined as the length of positive flavor (taste

and aroma) qualities emanating from the back of the palate and
remaining after the coffee is expectorated or swallowed. If they were
short or unpleasant, a lower score would be given.
• Acidity | Acidity is often described as "brightness" when favorable or

“sour” when unfavorable. At its best, acidity contributes to a coffee’s
liveliness, sweetness, and fresh- fruit character and is almost imme-
diately experienced and evaluated when the coffee is first slurped
into the mouth. Acidity that is overly intense or dominating may be
unpleasant, however, and excessive acidity may not be appropriate
to the flavor profile of the sample. The final score marked on the hor-
izontal tick-mark scale should reflect the panelist’s perceived quality
for the Acidity relative to the expected flavor profile based on origin
characteristics and/or other factors (degree of roast, intended use,
6 appendix 49
etc.). Coffees expected to be high in Acidity, such as a Kenya coffee,

or coffees expected to be low in Acidity, such as a Sumatra coffee,
can receive equally high preference scores although their intensity
rankings will be quite different.
• Body | The quality of Body is based upon the tactile feeling of the
liquid in the mouth, especially as perceived between the tongue
and roof of the mouth. Most samples with heavy Body may also
receive a high score in terms of quality due to the presence of brew
colloids and sucrose. Some samples with lighter Body may also have
a pleasant feeling in the mouth, however. Coffees expected to be high
in Body, such as a Sumatra coffee, or coffees expected to be low in
Body, such as a Mexican coffee, can receive equally high preference
scores although their intensity rankings will be quite different.
• Balance | How all the various aspects of Flavor, Aftertaste, Acidity

and Body of the sample work together and complement or contrast
to each other is Balance. If the sample is lacking in certain aroma or
taste attributes or if some attributes are overpowering, the Balance
score would be reduced.
• Sweetness | Sweetness refers to a pleasing fullness of flavor as well as

any obvious sweetness and its perception is the result of the presence
of certain carbohydrates. The opposite of sweetness in this context is
sour, astringency or “green” flavors. This quality may not be directly
perceived as in sucrose-laden products such as soft drinks, but will
affect other flavor attributes. 2 points are awarded for each cup
displaying this attribute for a maximum score of 10 points.
• Clean Cup | Clean Cup refers to a lack of interfering negative impres-

sions from first ingestion to final aftertaste, a “transparency” of cup.
In evaluating this attribute, notice the total flavor experience from
the time of the initial ingestion to final swallowing or expectoration.
Any non-coffee like tastes or aromas will disqualify an individual
cup. two points are awarded for each cup displaying the attribute of
Clean Cup.
• Uniformity | Uniformity refers to consistency of flavor of the different

cups of the sample tasted. If the cups taste different, the rating of
this aspect would not be as high. two points are awarded for each
cup displaying this attribute, with a maximum of ten points if all five
cups are the same.
6 appendix 50
• Cupper Points | Overall score is determined by the cupper and given

to the sample as “Cupper’s Points” based on ALL of the combined
attributes.
Description of Sweet Maria’s Coffee Inc. (2021) cupping variables, textually

extracted from their website:
• Fragrance – Refers to the fragrance of the dry ground coffee before

hot water is added.
• Wet Aroma – Refers to the smell of wet coffee grinds that form a
crust at the top of the glass after adding hot water.
• Flavor – This is the overall impression in the mouth, including the

above ratings as well as tastes that come from the roast.
• Body – Often called “Mouthfeel”, body is sense of weight and thick-

ness of the brew, caused by the percentage of soluble solids in the
cup including all organic compounds that are extracted from coffee
in brewing and ends up in the cup.
• Finish – The lingering tastes or emerging tastes that come after the
mouth is cleared. Often called “aftertaste”.
• Sweetness – simply put, this defines the level of perceived sweetness

in contrast to the bittering coffee qualities.
• Clean Cup – This does not literally mean dirt on the coffee. It’s just
about flavor. And raw, funky coffees that are “unclean” in the flavor
can also be quite desirable, such a wet-hulled Indonesia coffee from
Sumatra, or dry-processed Ethiopia and Yemeni types.
• Complexity – Complexity compliments flavor and aftertaste scores,

to communicate a multitude or layering of many flavors. It means
that there is a lot to discover in the cup.
• Uniformity – Uniformity refers to cup-to-cup differences. Dry-process

coffees can be less uniform than wet-process coffees by nature. We
would never avoid a lot that has fantastic flavors if occasionally it
waivers. This is scored during the cupping protocol, where multiple
cups are made of each lot being reviewed.
• Brightness – Acidity is the taste of sharp high notes in the coffee

caused by a set of Chlorogenic, Citiric Acid, Quinic Acid, and others,
sensed mostly in the front of the mouth and tongue.
6 appendix 51
• Cupper’s Correction - The Cupper’s Correction gives us the option

to boost the overall score up to 10 points more (totaling 110 possi-
ble points) in order to help express how good we think a coffee is
regardless of how the flavor attributes add up.
6 appendix 52
appendix b
Exploratory Data Analysis: Coffee Quality Institute Dataset:
Figure 9: Outlier detection boxplot of CQI dataset

6 appendix 53
Figure 10: Pearson correlation heatmap of CQI dataset after categorical variables
encoding
Sweet Maria’s Coffee Dataset:
Figure 11: Outlier detection boxplot of Sweet Maria’s Coffee

6 appendix 54
Figure 12: Pearson correlation heatmap of Sweet Maria’s dataset

6 appendix 55
appendix c
Fisher Scores Ranking Position

clean_cup 1
Sweetness 2
Aroma 3
altitude_high_meters 4
Uniformity 5
cupper_points 6
method_Pulped natural / honey 7
Flavor 8
Species 9
color_Green 10
method_Semi-washed / Semi-pulped 11
color_Bluish-Green 12
color_Blue-Green 13
method_Other 14
method_Natural / Dry 15
method_Washed / Wet 16
Moisture 17
cat2_def 18
Acidity 19
Body 20
altitude_low_meters 21
Balance 22
quakers 23
cat1_def 24
Aftertaste 25
Full set Uniformity clean cup Aroma Altitude Low Sweetness cupper points washed-pulped Flavor
7 Neighbors 81.00 81.66 81.66 80.57 91.48 88.65 90.83 90.61 52.40
5 Neighbors 82.75 82.75 81.66 80.79 92.14 89.30 91.70 91.27 52.84
3 Neighbors 85.15 84.28 82.53 82.10 93.01 87.99 92.36 91.70 52.84
mean 82.97 82.90 81.95 81.15 92.21 88.65 91.63 91.19 52.69
SE 0.85 0.54 0.21 0.34 0.31 0.27 0.31 0.22 0.10
Table 13: Accuracy of classification of specialty coffee quality with kNN and
Fisher scores using top 10 features
Full set Uniformity clean_cup Aroma Altitude Low Sweetness cupper points washed-pulped Flavor
7 Neighbors 81.66 80.35 81.22 81.00 79.26 78.60 78.38 77.95 76.86
5 Neighbors 81.88 81.44 81.22 80.79 79.91 79.26 78.60 78.60 77.95
3 Neighbors 84.06 84.06 82.97 83.19 80.35 79.69 78.38 78.38 78.17
mean 82.53 81.95 81.80 81.66 79.84 79.18 78.46 78.31 77.66
SE 0.54 0.78 0.41 0.54 0.22 0.22 0.05 0.14 0.29
Fisher scores using all features
6 appendix 56
appendix d
Features Position
Flavor 1
Aftertaste 2
cupper_points 3
clean_cup 4
Balance 5
Uniformity 6
Aroma 7
Acidity 8
Sweetness 9
Body 10
cat2_def 11
Moisture 12
Species 13
color_Green 16
cat1_def 19
method_Other 20
color_Blue-Green 21
quakers 23
Table 15: XGBoost feature importance ranking
Full set Flavor Aftertaste cupper points clean cup Balance Uniformity Aroma Acidity
7 Neighbors 98.69 98.25 97.82 96.72 95.63 92.58 87.55 86.90 88.43
5 Neighbors 99.56 98.91 97.82 96.94 95.63 92.79 88.21 89.30 88.43
3 Neighbors 98.91 99.13 98.03 97.38 96.72 92.79 89.52 89.74 83.41
mean 99.05 98.76 97.89 97.02 96.00 92.72 88.43 88.65 86.75
SE 0.19 0.19 0.05 0.14 0.26 0.05 0.41 0.62 1.18
XGB importance top 10 features feature ablation from most important to least
important
6 appendix 57
Full set Body Sweetness Acidity Aroma Uniformity Balance clean cup cupper points
7 Neighbors 98.69 98.47 98.03 97.60 97.38 94.54 94.98 87.55 87.55
5 Neighbors 99.56 98.47 98.25 97.38 96.94 94.98 94.76 86.68 88.65
3 Neighbors 98.91 98.69 98.47 97.38 97.82 95.41 95.20 88.65 88.65
mean 99.05 98.54 98.25 97.45 97.38 94.98 94.98 87.63 88.28
SE 0.19 0.05 0.09 0.05 0.18 0.18 0.09 0.40 0.26
Table 17: Accuracy of classification of specialty coffee quality with kNN and XGB
importance top 10 features and feature ablation from least important to most
important
Full set cat2 def Moisture Species Natural / Dry altitude low Green Washed / Wet altitude high
7 Neighbors 76.42 75.33 72.93 72.93 71.83 70.09 70.52 69.43 63.10
5 Neighbors 75.76 76.20 73.36 73.58 73.58 71.40 70.09 69.00 65.94
3 Neighbors 77.07 77.29 73.80 73.80 74.02 72.05 70.96 67.47 65.72
mean 76.42 76.27 73.36 73.44 73.14 71.18 70.52 68.63 64.92
SE 0.27 0.40 0.18 0.19 0.47 0.41 0.18 0.42 0.64
importance without coffee cupping scores top 10 features ablation
Full set Flavor cupper points clean cup Uniformity Aroma Acidity Sweetness Body
7 Neighbors 97.60 96.94 93.67 93.01 90.83 88.86 86.68 84.93 83.62
5 Neighbors 98.69 97.38 94.10 93.01 90.83 90.17 87.12 86.03 84.72
3 Neighbors 98.91 98.03 94.76 93.89 89.74 89.30 89.08 86.03 80.13
mean 98.40 97.45 94.18 93.30 90.47 89.45 87.63 85.66 82.82
SE 0.29 0.22 0.22 0.21 0.26 0.27 0.52 0.26 0.98
importance with coffee cupping scores top 10 features ablation and excluding
proxy features: ’Aftertaste’ and ’Balance’
6 appendix 58
appendix e
Feat Permutation Rank Position

Flavor 1
Uniformity 2
clean_cup 3
Aftertaste 4
cupper_points 5
Balance 6
Acidity 7
Aroma 8
Body 9
cat2_def 10
Sweetness 11
Moisture 12
cat1_def 16
quakers 19
method_Other 20
Species 21
color_Green 22
color_Blue-Green 24
Table 20: Ranking of feature importance according to feature permutation
Full set Flavor Uniformity clean cup Aftertaste cupper points Balance Acidity Aroma
7 Neighbors 97.82 97.38 96.29 91.92 91.05 87.77 85.15 85.37 86.03
5 Neighbors 97.82 97.82 96.29 92.58 92.14 89.08 85.81 86.46 86.03
3 Neighbors 98.25 98.25 96.29 93.01 91.92 90.39 86.90 87.34 87.55
mean 97.96 97.82 96.29 92.50 91.70 89.08 85.95 86.39 86.54
SE 0.10 0.18 0.00 0.22 0.24 0.53 0.36 0.40 0.36
Table 21: Accuracy of classification of specialty coffee quality with feature permu-
tation importance and top 10 features from most important to least important
Full set cat2 def Body Aroma Acidity Balance cupper points Aftertaste clean cup
7 Neighbors 97.82 98.03 98.03 98.03 97.38 96.29 96.29 95.63 94.98
5 Neighbors 97.82 98.69 98.25 98.25 96.94 96.51 96.72 95.20 95.20
3 Neighbors 98.25 98.25 98.47 98.25 97.82 97.60 96.51 94.76 94.76
mean 97.96 98.33 98.25 98.18 97.38 96.80 96.51 95.20 94.98
SE 0.10 0.14 0.09 0.05 0.18 0.29 0.09 0.18 0.09
Table 22: Accuracy of classification of specialty coffee quality with feature permu-
tation importance and top 10 features from least important to most important
6 appendix 59
Full set cat2 def Moisture altitude high altitude low Natural / Dry cat1 def Washed / Wet washed/pulped
7 Neighbors 75.11 74.67 70.74 70.52 65.94 65.72 56.55 56.55 48.69
5 Neighbors 75.33 75.11 73.80 70.52 66.16 65.28 56.55 56.99 49.13
3 Neighbors 76.64 77.07 72.49 70.74 66.59 66.16 53.28 57.86 49.78
mean 75.69 75.62 72.34 70.60 66.23 65.72 55.46 57.13 49.20
SE 0.34 0.52 0.63 0.05 0.14 0.18 0.77 0.27 0.22
feature permutation importance with coffee cupping scores top 10 features ablation
and excluding Coffee cupping scores.
Full set Flavor Uniformity clean cup cupper points Acidity Aroma Body cat2 def
7 Neighbors 97.82 96.51 95.20 92.79 89.30 89.08 88.65 84.93 82.75
5 Neighbors 98.25 96.94 95.63 93.23 89.96 89.08 89.08 83.41 79.69
3 Neighbors 98.47 98.03 95.20 93.67 91.48 90.17 89.30 85.37 83.19
mean 98.18 97.16 95.34 93.23 90.25 89.45 89.01 84.57 81.88
SE 0.14 0.32 0.10 0.18 0.46 0.26 0.14 0.42 0.78
feature permutation importance with coffee cupping scores top 10 features ablation
and excluding proxy features: ’Aftertaste’ and ’Balance’.
6 appendix 60
appendix f
Full set Flavor Fragrance Aroma Finish Complexity Sweetness Uniformity Brightness Clean cup
7 Neighbors 96.33 96.33 96.33 97.25 96.33 93.58 92.66 88.99 81.65 52.29
5 Neighbors 96.33 96.33 96.33 96.33 96.33 94.50 92.66 89.91 82.57 55.96
3 Neighbors 97.25 97.25 96.33 96.33 96.33 95.41 93.58 90.83 81.65 52.29
mean 96.64 96.64 96.33 96.64 96.33 94.50 92.97 89.91 81.96 53.52
SE 0.22 0.22 0.00 0.22 0.00 0.37 0.22 0.37 0.22 0.86
Table 25: Accuracy of classification of outstanding specialty coffee quality with

XGBoost feature importance and feature ablation from most important to least
importance
Full set Body Cup correction Clean cup Brightness Uniformity Sweetness Complexity Finish Aroma
7 Neighbors 96.33 96.33 95.41 95.41 96.33 96.33 97.25 96.33 94.50 94.50
5 Neighbors 96.33 96.33 97.25 97.25 96.33 96.33 97.25 96.33 94.50 95.41
3 Neighbors 96.33 97.25 97.25 98.17 97.25 96.33 97.25 96.33 95.41 93.58
mean 96.33 96.64 96.64 96.94 96.64 96.33 97.25 96.33 94.80 94.50
SE 0.00 0.22 0.43 0.57 0.22 0.00 0.00 0.00 0.22 0.37

XGBoost feature importance and feature ablation from least important to most
important
6 appendix 61
appendix g
Full set Flavor Finish Complexity Fragrance Clean cup Uniformity Brightness Body Cuppers correction
7 Neighbors 97.25 96.33 97.25 96.33 95.41 95.41 94.50 90.83 89.91 88.99
5 Neighbors 97.25 96.33 97.25 97.25 96.33 95.41 95.41 88.07 89.91 86.24
3 Neighbors 97.25 97.25 98.17 98.17 96.33 94.50 95.41 87.16 90.83 86.24
mean 97.25 96.64 97.55 97.25 96.02 95.11 95.11 88.69 90.21 87.16
SE 0.00 0.22 0.22 0.37 0.22 0.22 0.22 0.78 0.22 0.65

feature permutation and feature ablation from most important to least important.
Full set Aroma Sweetness Cuppers correction Body Brightness Uniformity Clean cup Fragrance Complexity
7 Neighbors 96.33 96.33 96.33 96.33 96.33 96.33 96.33 97.25 96.33 92.66
5 Neighbors 96.33 96.33 96.33 96.33 96.33 96.33 96.33 97.25 97.25 91.74
3 Neighbors 96.33 97.25 96.33 96.33 96.33 96.33 97.25 97.25 97.25 92.66
mean 96.33 96.64 96.33 96.33 96.33 96.33 96.64 97.25 96.94 92.35
SE 0.00 0.22 0.00 0.00 0.00 0.00 0.22 0.00 0.22 0.22

feature permutation and feature ablation from least important to most important.
6 appendix 62
appendix h
Full set Other color green Flavor clean cup

7 Neighbors 80.35 80.35 81.66 74.02 65.94
5 Neighbors 82.31 82.31 82.75 75.76 65.72
3 Neighbors 83.19 83.19 83.41 77.29 68.12
mean 81.95 81.95 82.61 75.69 66.59
SE 0.59 0.59 0.36 0.67 0.54
Table 29: Accuracy of kNN using hierarchical clustering subset 1
Full set Natural / Dry Washed / Wet Aroma Uniformity

7 Neighbors 77.95 76.42 78.17 73.80 69.65
5 Neighbors 77.51 76.86 79.26 75.11 70.09
3 Neighbors 79.91 79.26 81.22 76.20 68.56
mean 78.46 77.51 79.55 75.04 69.43
SE 0.52 0.62 0.63 0.49 0.32
Table 30: Accuracy of kNN using hierarchical clustering subset 2

Coffee Quality Classification with Feature Importance Analysis

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Coffee Quality Classification with Feature Importance Analysis

Uploaded by

Copyright:

Available Formats

C O F F E E Q UA L I T Y

ALCI SEBASTIAN BURGOS MORENO

thesis submitted in partial fulfillment

alci sebastian burgos moreno

3.7 Cross Validation . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Coffee cupping is the sensory examination and evaluation of coffee, this is

1.2 Project Relevance

In order to sell a bag of coffee in the international market a farmer needs to

Table 1: Description of specialty coffee classification.

In light of existence of different classification models it is necessary

1.3 Project Goal

This project aims to understand the interaction between features to discover

1.4 Research Questions

• RQ1 Which features influence specialty coffee quality classification?

Due to the nature of coffee evaluation related works will be presented in

2.1 Similar Research

In order to understand the feature importance in coffee classification is

2.2 Feature importance

2.3 Coffee Quality Prediction and Classification

i Pandas (McKinney et al., 2010)

ii Numpy (Harris et al., 2020)

iii Seaborn (Waskom et al., 2017)

iv AutoImpute (Talwar, Mongia, Sengupta, & Majumdar, 2018)

v ScikitLearn (Buitinck et al., 2013)

vi ImbalancedLearn (Lemaître, Nogueira, & Aridas, 2017)

vii MatplotLib (Hunter, 2007)

viii XGBoost (T. Chen & Guestrin, 2016a)

ix Jupyter Notebook (Kluyver et al., 2016)

3.2 Data Description

3.2.1 Coffee Quality Institute

3.2.2 Sweet Maria’s Coffee

company Sweet Maria’s evaluate the coffee through a different procedure

3.3 Exploratory Data Analysis

3.3.1 Coffee Quality Institute

Table 2: Coffee Quality Dataset description.

As well, it is necessary to mention that even though the scores range

Table 3: Variance of each of the selected features after categorical encoding.

3.3.2 Sweet Maria’s Coffee

Table 4: Sweet Maria’s Coffee Dataset description.

Table 5: Variance of each of the selected features after categorical encoding.

3.3.3 Compatibility and discrepancy

On the other hand, the evaluation of both varies in essential aspects:

• Balance / Complexity: The SCAA recognizes that balance score is

• Cupper’s Correction/Cupper Points : Sweet Maria’s acknowledges

In-depth description of each of the coffee cupping variables can be found

3.4.1 Coffee Quality Institute

Figure 1: Check of balance of classes.

3.4.2 Sweet Maria’s Coffee

Figure 2: Check of balance of classes.

3.5 Description of feature importance algorithms

Feature selection and feature importance are connected on the level of

3.5.1 Filter Methods

3.5.2 Wrapper Methods

3.5.3 Embedded Methods

The XGBoost uses an objective function, which has a primary objective, in

• 1. The standard gradient boosting machines do not have a regulariza-

• 2. It has a build in cross validation, which naively allows for an easier

• 3. Speed can be mentioned as one of the advantages of using XGBoost

The parameters were searched through a grid search with a stratified

• subsample/colsample_bytree [0.6] was chosen below the default

3.5.4 Hierarchical Clustering

D ( X, Y ) = SSE( XY ) − [SSE( X ) + SSE(Y )] (6)