You are on page 1of 4

Classification and Determination of pH Value: A

Decision Tree Learning Approach


Francheska B. Chioson, Francisco Emmanuel T. Munsayac Jr. III, Raphael Benedict G. Luta, Renann G. Baldovino
and Nilo T. Bugtai
Manufacturing Engineering and Management (MEM) Department
Gokongwei College of Engineering, De La Salle University (DLSU)
2401 Taft Avenue, 0922 Manila, Philippines
Corresponding author: *renann.baldovino@dlsu.edu.ph

Abstract— In chemistry, the potential of hydrogen (pH) level is


the measure of the acidity or basicity among substances. Generally,
this level is determined through the use of a type of indicator dipped
into an aqueous solution. A standard pH scale is used to classify the
liquid as either basic or acidic. In this paper, the decision tree (DT)
learning was implemented using the classification and regression
trees (CART) algorithm in classifying whether a substance is basic
or acidic. The input data used in this study is the HSV (hue,
saturation, value) color space with their corresponding pH level,
ranging from 0 to 14. A total of 1,410 data samples were used in
which 70% were assigned for training and 30% for testing. Results Fig. 1 The pH scale
displayed a high accuracy of 95.3%. Thus, DT algorithm is an
effective choice in classifying the pH level of a substance. For more precise measurements, a colorimeter or
Index Terms— classification, decision tree, machine learning, spectrophotometer can be used. In addition, there are
pH value still other methods of measuring pH levels, which
include the use of a pH probe or sensor.
I. INTRODUCTION
Chemistry, medicine, food and agriculture are
Decision tree (DT) classifier is a type of non- some of the few important applications of using pH
metric pattern recognition technique that develop level indicator. In one study, Luta et al. [5] proposed
rules that are easy to interpret. Advantages of using a noncontact pH level indicator using computer
DT classifier include the ease of use, very flexible vision technique that uses HSV in determining the
and can be applied in classification and regression pH level. Their study proposes a different approach
[1]. However, DT algorithm is complex, memory compared to other pH level indicators where HSV
and computation time intensive and sometimes values are compared to a knowledge-based system.
produces over fitting. In terms of applications, DTs With that, this research is intended to apply the
are used in brain cancer detection [1], EEG signal decision tree algorithm to the noncontact pH level
classifications [2], precipitation prediction [3] and indicator in the determination of the pH level of a
gas detection [4]. substance.
A Danish chemist, Søren Peder Lauritz Sørensen,
first introduced the concept of potential of hydrogen II. DECISION TREES (DT)
(pH) in the year 1909. The pH level is measured by A. Decision Tree (DT) Learning
the amount of hydrogen in a substance. The simplest DT is a type of supervised learning technique that
way of obtaining the pH level is through a pH classifies data recursively. It is used for inductive
indicator such as a litmus paper. The indicator is inference and is widely used in data mining as it can
dipped into a liquid where the color of the paper handle large chunks of data [6]. Normally, a DT is
changes according to its pH level. The output color composed of node that branch out from a rooted tree,
is then compared to a scale similar to the one on as seen in Figure 2.
Figure 1.

978-1-5386-7767-4/18/$31.00 ©2018 IEEE

Authorized licensed use limited to: Badan Riset Dan Inovasi Nasional. Downloaded on July 12,2022 at 03:46:20 UTC from IEEE Xplore. Restrictions apply.
attribute is wide. Meanwhile, its regression tree
splits the attributes by minimizing the prediction
square error. The advantages of this algorithm
include its flexibility, variable selection and
interaction among variables. Its disadvantages
include the splitting on one variable and the
instability of the tree
III. METHODOLOGY
Fig. 2 Sample of a DT visual representation A. Data Acquisition and Pre-processing
The method of data acquisition performed in this
Tree models that predict an output of class is a study was adapted from [5]. As shown in Figure 4, a
classification tree while tree models that predict an block diagram provides the procedure for the
output of a real number is a regression tree [7, 8]. noncontact pH level indicator.
Likewise, a DT is similar to a long continuous
list of if-else statement [9]. A perfect representation
of the process can be visualized through a sample
tree diagram in Figure 3.

Fig. 3 Sample of a DT in purchasing either an apartment, office or warehouse

The problem presented is an example of a


decision tree of determining what commercial
building to purchase based on economic conditions.
Each decision is multiplied by a weight based on its
probability. The decision tree shown is similar to a
flowchart that uses if-else statements to reach a
desired output and, in this case, a decision of what to Fig. 4 Noncontact pH level indicator procedure
purchase.
OpenCV was used to record the RGB values of
B. CART Algorithm the images. First, an image of the pH indicator was
The classification and regression trees (CART) taken and converted from RGB into HSV. Next, the
was first introduced by Breiman in 1984. Its image was blurred to reduce noise. Then, the HSV
classification tree is binary and splits an attribute range was compared to a knowledge-based system
using a Gini index or twoing criteria [3]. Gini index to help create a mask. Masking isolated the image
is a measure of impurity while the twoing criteria is color while the contours were determined and drawn
employed, instead of the Gini Index, when the target on the original frame. Finally, the color spectrum of

Authorized licensed use limited to: Badan Riset Dan Inovasi Nasional. Downloaded on July 12,2022 at 03:46:20 UTC from IEEE Xplore. Restrictions apply.
the pH level is stored through a knowledge-based B. Classification Method
system. In this study, pH values were acquired using As the aim of the study was to determine the
a universal pH scale tester similar to Figure 4. effectiveness of a pH level indicator using HSV, a
DT classifier along with a graphical representation
of the final DT was implemented using Python. The
scikit-learn library was used to generate the DT
through the CART algorithm while visualization of
the tree was generated using the graphviz package.
In the data acquisition stage, 10 different HSV were
collected for every pH value. After acquiring the
data, samples were split into two samples: training
Fig. 4 Sample of a universal pH scale tester and test samples. The train samples comprised of 7
train values per pH level while 3 for the test values
The data gathered from the noncontact pH level
per pH level. Overall, the entire data set has a total
indicator determined the HSV as input together with
of 1,410 samples.
their corresponding pH value as output. The pH
level value ranges from 0 to 14 with increments of IV. RESULTS AND DISCUSSION
0.1 [10, 11]. In this study, Table 1 provides a sample In this study, a total of 1,410 samples were used
of the training dataset for the DT while Table 2 for with 10 HSV per pH level. The feature data was
the test dataset to be used. divided into two model sets: training and testing.
TABLE I. TRAINING DATASET The recorded data was used in getting the system’s
accuracy. Moreover, the DT model was exported in
H S V pH
179.5 222.1 238.9 0.0 graphviz, a segment of which is represented in
179.2 222.8 238.3 0.0 Figure 5.
179.7 223.3 238.8 0.0
179.0 224.6 238.2 0.0
179.1 222.6 239.4 0.0
179.3 223.1 238.7 0.0
179.8 223.1 238.6 0.0
0.4 223.6 239.4 0.1
0.4 224.2 239.0 0.1
0.2 223.3 238.9 0.1
0.3 224.5 238.7 0.1
0.5 221.4 238.0 0.1
0.3 221.9 239.5 0.1
0.1 222.1 239.2 0.1
1.3 223.1 239.7 0.2
1.6 221.7 239.1 0.2
0.8 223.5 238.7 0.2
1.7 224.5 238.3 0.2

TABLE II. TEST DATASET


H S V pH Fig. 4 Sample of a segment of the DT pH level indicator
179.0 225.0 238.0 0.0
179.6 221.0 239.7 0.0 The DT classifier has an accuracy score of 95.3%,
179.5 224.8 239.2 0.0
0.1 224.2 238.4 0.1
recall score of 95.27%, precision score of 96.6% and
0.5 223.2 239.9 0.1 F-measure of 95.04%. The purpose of the accuracy
0.2 224.4 238.8 0.1 score is to determine how accurate the system can
1.2 223.4 239.9 0.2
1.4 224.7 239.3 0.2 predict. The recall or sensitivity score is used to the
1.1 224.3 239.3 0.2 system’s ability to predict all the positive samples. A
2.3 222.6 239.2 0.3
2.6 224.0 239.0 0.3
low recall score indicates higher false negatives.
2.5 224.1 238.1 0.3 Meanwhile, the precision score is the measure of the
classifiers exactness. A low precision score indicates

Authorized licensed use limited to: Badan Riset Dan Inovasi Nasional. Downloaded on July 12,2022 at 03:46:20 UTC from IEEE Xplore. Restrictions apply.
higher false positives. Lastly, the F-measure shows [10] J. C. Puno, E. Sybingco, E. Dadios, I. Valenzuela, and J. Cuello,
“Determination of soil nutrients and pH level using image
the balance between precision and recall. Moreover, processing and artificial neural network,” IEEE 9th Int. Conf.
this metric considers both the false positives and Humanoid, Nanotechnology, Inf. Technol. Commun. Control.
negatives. Environ. Manag, pp. 1–6, 2017.
[11] B. Building, L. Terrace, U. Kingdom, and I. Systems, “Decision
V. CONCLUSION AND RECOMMENDATIONS tree learning based feature evaluation and.”

This paper presented a pH level classification


system by employing supervised decision tree
learning. Values of HSV were extracted through
machine vision and the DT was implemented and
visualized using Spyder. Acceptable results have
been attained more using the DT classification
algorithm. The accuracy measured using this
classification technique is 95.3%. It only shows that
the results indicate that the DT classification
technique is an effective approach in classifying
different pH levels.
Future work may involve comparing the decision
tree with other machine learning techniques such as
artificial neural network (ANN) or support vector
machine (SVM).
ACKNOWLEDGEMENTS
The authors would like to thank the Engineering
Research and Development for Technology (ERDT)
of the Department of Science and Technology
(DOST) for the funding and dissemination support.
REFERENCES
[1] P. Hamsagayathri, and P. Sampath, “Priority based decision tree
classifier for breast cancer detection,” Adv. Comput. Commun.
Syst. (ICACCS), 2017 4th Int. Conf., pp. 1–6, 2017.
[2] H. Rajaguru, “Epilepsy classification from EEG signals,” pp.
581–584, 2017.
[3] N. Prasad, K. R. Patro, and M. M. Naidu, “A gini index based
elegant decision tree classifier to predict precipitation,” Asia
Model. Symp. 2013 7th Asia Int. Conf. Math. Model. Comput.
Simulation, AMS 2013, pp. 46–54, 2013.
[4] M. Hassan, and A. Bermak, “Gas classification using binary
decision tree classifier,” pp. 2579–2582, 2014.
[5] R. B. G. Luta, A. C. L. Ong, S. J. C. Lao, R. G. Baldovino, N. T.
Bugtai, and E. P. Dadios, “A noncontact pH level sensing
indicator using computer vision and knowledge-based systems,”
IEEE 9th Int. Conf. Humanoid, Nanotechnology, Inf. Technol.
Commun. Control. Environ. Manag. pp. 1–5, 2017.
[6] P. H. Swain and H. Hauska, “The decision tree classifier: design
and potential,” IEEE Trans. Geosci. Electron. 15 (3), pp. 142–
147, 1977.
[7] Z. Wang, Y. Liu, and L. Liu, “A new way to choose splitting
attribute in ID3 algorithm,” pp. 659–663, 2017.
[8] Y. Dandotiya, “Moment method for YCbCr and HSV color
space,” vol. 2, no. 1, pp. 662–668, 2017.
[9] J. Van De Weijer, T. Gevers, and A. D. Bagdanov, “Boosting
color saliency in image feature detection,” IEEE Trans. Pattern
Anal. Mach. Intell. 28(1), pp. 150–156, 2006.

Authorized licensed use limited to: Badan Riset Dan Inovasi Nasional. Downloaded on July 12,2022 at 03:46:20 UTC from IEEE Xplore. Restrictions apply.

You might also like