You are on page 1of 16

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/352342962

Assessment of the State of Quality in Garments Applying Data Mining


Mechanisms: A Case Study in the Apparel Industry

Article · January 2020


DOI: 10.2139/ssrn.3862467

CITATIONS READS

0 52

3 authors:

Pavithra Basnayake Anuradha Hewaarachchi


University of Peradeniya University of Kelaniya
18 PUBLICATIONS 9 CITATIONS 16 PUBLICATIONS 42 CITATIONS

SEE PROFILE SEE PROFILE

Vasana Chandrasekara
University of Kelaniya
71 PUBLICATIONS 91 CITATIONS

SEE PROFILE

All content following this page was uploaded by Pavithra Basnayake on 24 April 2023.

The user has requested enhancement of the downloaded file.


11th International Conference
on
Business and Information
ICBI - 2020

"Transforming Business Strategies for Economic


Resilience"

Conference Proceeding

Faculty of Commerce and Management Studies


University of Kelaniya
Sri Lanka

19th November 2020

i
© 2020 - Faculty of Commerce and Management Studies

11th International Conference on Business & Information


"Transforming Business Strategies for Economic Resilience"

Conference Proceeding

Responsibility of the content of the Full papers included in this publication remains
with the respective authors.

Web: http://conf.kln.ac.lk/icbi/
E-Mail: icbi@kln.ac.lk
Telephone: +94112903502
Fax: +94112917708

ISSN 2465-6399

Faculty of Commerce and Management Studies


University of Kelaniya, Sri Lanka

Editor in Chief: Dr. S. C. Thushara


Production Credits: Mr. R. K. H. S. Wimalasiri
Cover Design: Mr. Dinuka Kannangara

ii
ISSN 2465-6399

Assessment of the State of Quality in garments applying Data mining


mechanisms: A Case Study in the Apparel Industry

Basnayake, B. R. P. M.1 , Hewaarachchi, A. P.2 and Chandrasekara, N. V.3

Forecasting the quality of sewed garments is an important area in the apparel industry. This paper
consists of a case study relevant to a high-ranking apparel manufacturing plant in Sri Lanka.
Quality is measured using the First Time Through (FTT) state which is a measure of production
competence and capacity. The factory capacity is to afford the FTT 98% or above as a high state
category. The low state is consisted of FTT of less than 98%. Recently Data mining methods are
used to extract insights from data and to make fast decisions. The main objective of the study is to
identify the better model to predict the FTT state with data mining mechanisms. Classification tree
and Probabilistic Neural Network (PNN) models were used to forecast the FTT state with the
under-sampling method due to the matter of class imbalance in the original dataset. True positive
(TP), False-positive (FP), precision, recall, accuracy and F-measure were used as the performance
measurements. FP rate was zero and precision was one in the classification tree. While the FP rate
was 0.0649 and precision was 0.9348 in the PNN model. Both models had a high F-measure value
of 0.9745 and 0.9287 respectively. Therefore, two models can be used in prediction with better
performance measurements. Outcomes of the study will help to find out the optimum allocation of
a style to a relevant team to achieve the highest FTT state, to recognize the training requirements
of the employees and to improve the satisfaction of the customer.

Keywords: Apparel, Decision Tree, First Time Through State, F-measure, Probabilistic Neural
Network.

Cite this paper as;


Basnayake, B. R. P. M., Hewaarachchi, A. P. & Chandrasekara, N. V (2020 Assessment of
the State of Quality in garments applying Data mining mechanisms: A Case Study in the
Apparel Industry. The Conference Proceedings of 11th International Conference on Business
& Information ICBI, University of Kelaniya, Sri Lanka. ISSN 2465-6399, (pp. 659-670)

1
Department of Statistics & Computer Science, Faculty of Science, University of Kelaniya, Sri
Lanka. (pavithramalkibasnayake@gmail.com)
2
Department of Statistics & Computer Science, Faculty of Science, University of Kelaniya, Sri
Lanka. (anuradhah@kln.ac.lk)
3
Department of Statistics & Computer Science, Faculty of Science, University of Kelaniya, Sri
Lanka. (nvchandrasekara@kln.ac.lk)
659
International Conference on Business and Information (ICBI) 2020
Faculty of Commerce and Management Studies, University of Kelaniya, Sri Lanka
Electronic copy available at: https://ssrn.com/abstract=3862467
ISSN 2465-6399

Introduction
Quality for the textile industry is defined as a measurement concerning the acceptable level of
materials. It is a vital sector from the inspection of raw materials to the final inspection of goods
(Kurniati at al., 2015). As a leading apparel solutions provider in Sri Lanka, the company in the study
affords inspired solutions to famous brands across the world. This apparel company is a
manufacturing plant relevant to products like thongs, briefs, boxers and camisoles. The quality of the
finished good is measured using the First Time Through (FTT) state value for a top brand. Each
brand is consisted of common types of operating activities for each type of garment. In addition, the
same style runs in a team at least a week. However, other brands are complex due to critical activities
and a variety of garments. Those brands do not have a long run in a team due to small quantities
ordered by buyers. FTT is a measure of production competence and capacity. It is a percentage of the
number of units without defects against the total units produced in a production process as illustrated
in Equation 1. The factory capacity is to afford the FTT 98% or above as a high state category. The
low state is consisted of daily FTT of less than 98%.
Daily Production−Number of reworks per the day
FTT Percentage = ∗ 100% (1)
Daily Production

The work floor consists of 60 teams assigned with a style and 24 Machine Operators. The factors that
affect the FTT state which are independent variables for the selected top brand were 60 teams, 9
garments, 19 sawing activities, 51 types of reworks, daily production and the total number of reworks.
Every quality checker in a team maintained a document known as an End-line rework report. For the
relevant operation activity in the style, each defect type was marked with the coded letter. They were
noted and the sum of all defects was used to calculate daily FTT state relevant to the daily production.
All the 51 types of damages were listed down in the Defect type sewing document.
The main perspective of the Quality Department in the factory is to maintain the quality level in the
high state category by avoiding the sawing damages. In addition, they need to identify the skills of
the sawing operators to allocate them for the necessary improvements or teams in the factory.
Therefore, it is important to assess the quality of the products to identify whether the final finished
garments will be in the high or low state category. Here, the classification problem deals with the
categorization of data into predetermined classes.
The main purpose of the study is to build suitable models using data mining methods for the
classification problem. Then to select the better model to forecast the states of FTT. The results of
the study mainly benefit as an indicator to identify the optimum allocation of a team to a style that is
the garment type of the top brand to achieve the highest FTT state. In addition to recognizing the
training requirements of the employees in the low state category, also need to improve the satisfaction
of the customer by improving the state of the quality in garments.

Literature Review
Quality has many implications and indications. Sizwe and Charles (2017) stated quality should be
maintained for all the resources in a garment such as machines, accuracy in measurements, fabric,
storage and labor skills. In 2018, Choudhary et al. described the sewing defects in garment production
and analyzed them for the optimum selection of parameters and to minimize faults. They have found
that the sewing damages such as needle cut and other sewing defects were mostly in woven fabric.
The sewing damage problem did not have a direct solution which is capable of removing the damages
in fabric. The affecting parameters related to fiber and sewing machines must be examined to design
appropriate remedial measurements related to machine design. Different studies had been done based
on the quality and production efficiency in the apparel sector. However, still a few types of research
had been conducted about the FTT. In the study of Mohan et al. (2012), First Pass Yield (FPY) was
used and reduced the defect rate of a product with Pareto analysis, cause and effect diagram. It
concluded the statistical process control approach is an effective means for controlling and improving
the process quality. Uddin et al. (2014) reduced defects in the sewing division of a garment factory
660
International Conference on Business and Information (ICBI) 2020
Faculty of Commerce and Management Studies, University of Kelaniya, Sri Lanka
Electronic copy available at: https://ssrn.com/abstract=3862467
ISSN 2465-6399

using Define, Measure, Analyze, Improve and Control (DMAIC) methodology of six sigma. The
objective of their study was to reduce the number of defects to a minimum level whereas to reduce
production cost and increase quality and productivity. Quality was measured based on the output of
the process. The Pareto analysis and cause and effect diagram were used in the analysis. The Pareto
analysis was used to identify the major defects and then cause and effect diagrams were used for the
identified defects.
Other than the traditional statistical methods, different studies were conducted using the data mining
techniques. In the year 1994, Goebel applied the Probabilistic Neural Network (PNN) technique to
monitor and diagnose tool wear in manufacturing milling machines as the quality products increased
in demand for automated manufacturing systems. PNN model was advantageous as it allowed the
use of exigent probabilistic analysis with Bayes optimal classification. To measure the sewing
performance of the fabric, Hui et al. (2007) used Artificial Neural Network (ANN) with the
backpropagation algorithm and trained the ANN model with 10 000 iterations and converged to the
minimum error. The inputs used were the physical and mechanical characteristics of the fabric. The
outputs were the control levels of sewing performance. They were puckering, needle damages, hoax
and overfeeding. A validation set was used to identify the accuracy and effectiveness of the neural
network model. The prediction accuracy was high with a value of 93%. Hsu and Wang (2005) used
the decision tree to recognize the sizes of pants of soldiers by classifying the important patterns in
shapes of the body and it was advantageous for effective production. The study of Jain and Kumar
(2020) built a classification model using Naïve Bayes, Random Forest, Bayesian Forest and Decision
trees to forecast the garment types (lower, upper, whole-body) and subtypes (blouse, dress, and etc.).
They stated that the random forest method had a high accuracy compared to other techniques as it
controlled the unbalanced data and created a huge number of trees that were uncorrelated. Xing et al.
(2019) suggested principal component analysis with PNN to cluster the shapes of the human body in
a garment factory which was a more accurate model.

Methodology
Data were gathered for months January, February and March in 2019 from the report, End-line
rework provided by the quality checkers. In the dataset, there were no missing values. Randomly
selected 80% of the data were used to train the model and the remaining 20% to test the model.
Initially, descriptive analysis and cross-tabulations were conducted to identify the basic features of
the data, compare and analyze the relationship between variables.
Association tests
Pearson correlation coefficient (r) is a measure of the linear correlation of two variables as in equation
2. In the study, r is used to identify the strength of the linear relationship between variables. Through
the Cauchy – Schwarz inequality, it has a range between positive one and a negative one. A strong
positive linear correlation has a value close to a positive one and zero detect no linear correlation.
Negative one implies a strong negative linear correlation.
∑𝑛 𝑛 𝑛
𝑖=1 𝑥𝑖 𝑦𝑖 − ∑𝑖=1 𝑥𝑖 ∑𝑖=1 𝑦𝑖
rxy = (2)
2 2
√𝑛 ∑𝑛 2 𝑛 √𝑛 ∑𝑛 2 𝑛
𝑖=1 𝑥𝑖 −(∑𝑖=1 𝑥𝑖 ) 𝑖=1 𝑦𝑖 −(∑𝑖=1 𝑦𝑖 )

The Chi–Square test of independence is a non-parametric test used to determine a significant


relationship between two categorical variables.
The hypothesis of the Chi–Square test is as follows:
H0: There is no significant relationship between the two categorical variables.
H1: There is a significant relationship between the two categorical variables.
The acceptance or rejection of the null hypothesis depends on the Chi–Square statistic and the critical
value from the Chi–Square distribution.
661
International Conference on Business and Information (ICBI) 2020
Faculty of Commerce and Management Studies, University of Kelaniya, Sri Lanka
Electronic copy available at: https://ssrn.com/abstract=3862467
ISSN 2465-6399

Classification tree
A decision tree is a data mining algorithm with a tree-like structure of the root node at the first. The
classification tree attempts to predict the values of a categorical dependent variable from one or more
continuous or categorical variables. The best attribute is identified using a top-down induction
algorithm by splitting the learning sample and proceed until each observation is correctly identified.
Gini impurity measure in equation 3 is used to identify how well the two or more classes are
separated.
𝐺𝑖𝑛𝑖 𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦 𝑚𝑒𝑎𝑠𝑢𝑟𝑒 = ∑ 𝑃(𝑖) ∗ (1 − 𝑝(𝑖)) (3)
Where P(i) is the probability of a given classification i.
Probabilistic Neural Network (PNN)
Probabilistic neural networks are used in classification problems. It consists of four layers in the
network architecture. The Input Layer consists of one neuron for each predictor variable. The next
layer, the pattern layer contains one neuron for each observation in the training data set. The neuron
garners the value of the predictor variables with the target value. When presents with the input values
from the input layer, a neuron in the pattern layer calculate the Euclidean distance of the test case
from the neurons’ center point. Then concerns the radial basis function kernel. After-wards the value
is passed to the summation level which is the third layer. There is one neuron for each category of
the target variable. The actual target category of each training case is stored with each pattern neuron.
The weighted values of pattern neurons are furnishing to the summation neuron relevant to the pattern
neuron’s category. The summation neurons add the value for the class they represented. Then the
decision layer assesses the weighted votes for each target category and uses the largest vote to
prognosticate the target category.
The spread parameter is a benefit for the maximum classification accuracy of classes. When the
parameter value is close to zero that indicates the performance of the network is based on the
imminent neighbor classifier. While for large spread values it considers various close neighbor
classifiers.
Confusion matrix
A confusion matrix is precise of prediction outcomes on a classification problem as indicated in Table
1. The number of correct and incorrect predictions are precise with count values respect to each class.

Table 1: Confusion matrix


Class 1 Predicted Class 2 Predicted Row Total
Class1 Actual TP FN P
Class 2 Actual FP TN N

Positive (P) is the total observations from True Positive (TP) and False Negative (FN) whereas
Negative (N) indicates the total in False Positive (FP) and True Negative (TN). TP observation is
positive and predicted to the same. While FN occurs when the observation is positive but predicted
as negative. TN takes place when the observation is negative and predicted to be negative. FP comes
out when the observation is negative but is predicted positive.
Performance measurements are FP rate, TP rate, accuracy, recall, precision and F-measure. FP rate
is the FP over negatives while the TP rate is the TP over positives. Accuracy is the ratio of the total
number of correctly classified positives and negatives divide into the total positives and negatives as
illustrated in Equation 4:
TP+TN
Accuracy = (4)
TP+TN+FP+FN
662
International Conference on Business and Information (ICBI) 2020
Faculty of Commerce and Management Studies, University of Kelaniya, Sri Lanka
Electronic copy available at: https://ssrn.com/abstract=3862467
ISSN 2465-6399

Equation 5 contains the recall where it is defined as the ratio of the total number of correctly classified
positives over the total number of positives. High Recall indicates the class is correctly recognized.
TP
Recall = (5)
TP+FN

To calculate the value of precision, the total number of correctly classified positive is divided by the
total number of predicted positive. High Precision denotes labeled as positive is truly positive.
Precision calculation is exhibited in Equation 6:
TP
Precision = (6)
TP+FP

High recall, low precision indicates most of the positives are correctly recognized but there are a lot
of false positives. Low recall, high precision shows missing of positives but predicted as positive
are indeed positive.
As mentioned in Equation 7, F-measure uses two measures, precision and recall to have a
measurement that represents both of them.
2∗recall∗precision
F − measure = (7)
recall+precision

Class imbalance problem


Imbalanced class is a common problem in real-world classification problems where there are
unbalanced ratios of observations among the classes. In this study, the under-sampling method was
used as it is easy to work with the majority class. It reduces the number of observations from the
majority class to make the data balanced. This method is more appropriate to use when the data set
is huge and reducing the size of the training sample contributes to improving run time and storage
matters.
Findings

This section explains the basic features of the dataset, results of the classification trees and the PNN
models with the class-imbalance and after reducing the problem of class-imbalance.
Descriptive analysis
The two state high and low is categorized based on the factory standard. They aim to maintain the
FTT equal or above 98% as the state high.

Fig1: Pie chart of the daily FTT State

According to Figure 1, 82% of the data belongs to the state high category. Only 12% of the data
belonges to the low category. These results illustrate the presence of class imbalance problem in the
dataset under the study.
663
International Conference on Business and Information (ICBI) 2020
Faculty of Commerce and Management Studies, University of Kelaniya, Sri Lanka
Electronic copy available at: https://ssrn.com/abstract=3862467
ISSN 2465-6399

The bar chart in Figure 2 indicated the garment Thong had the highest frequency of 1592 that sewed
in the factory within the considered three months.

Fig 2: Bar chart of the Garments


By referring to the cross-tabulation between FTT state and garments, it was identified that the raw
cut was sewed with the highest frequency for the state high while for the low state thong garments
were sewed with the highest frequency. Through the cross-tabulation between state and damage type
high–low was the most affected damage type for the state high with the highest frequency. For the
low state, the highest influence was from the measurement out damage type.
The correlation between the total number of damages and the daily output was -0.0276 and it implies
a weak negative correlation. The correlation between the team and the total number of damages was
a weak positive correlation with a value of +0.0949. The correlation between the team and the daily
output was +0.0194. Daily output and team had a weak positive correlation. The chi-square test of
independence implied all the p values were smaller than 0.05. Therefore, reject H 0 at 5% level of
significance and consider there were associations between variables operating activities, garments
and rework types.
Classification tree
By referring to the built classification tree, the root was the ‘number of reworks’ with the predicate
value 28.5 in the classification tree. In the next level ‘daily production’ was in the decision nodes. In
later splitting, ‘types of reworks’, ‘garments’ and ‘operation activities’ were used in the decision tree
as decision nodes. An important fact was the input variable, ‘team’ was not in the built decision tree.
To check the accuracy of the model, the confusion matrix was built as follows.
972 7
𝐶= [ ] (8)
5 216
The above confusion matrix illustrates that there were seven records that belong to the class high but
predicted as the class low by the classification tree. At the same time, there were five records that
belong to the class low but predicted as the class high.
Table 2: Performance measurements of the classification tree
Performance measurement Rate
FP rate 0.0463
TP rate 0.9949
Precision 0.9899
Recall 0.9949
Accuracy 0.9875
F measure 0.9925

664
International Conference on Business and Information (ICBI) 2020
Faculty of Commerce and Management Studies, University of Kelaniya, Sri Lanka
Electronic copy available at: https://ssrn.com/abstract=3862467
ISSN 2465-6399

Table 2 consists of performance measures associated with the classification tree. The FP rate was
0.0463 which was a low value. Further, TP rate, precision, recall, accuracy and F measure values
were close to one.
From the confusion matrix 8, TP records for the class high state was 972 while class low had 216
observations in TN. This indicates the problem of class imbalance as the majority of the observations
belong to the TP.
Classification tree with under-sampling technique
Under-sampling involved the removal of some of the data in majority class which was from the high
state category, to result in a balanced distribution with class low.
Randomly selected 80% of the data were used to build the tree after applying the under-sampling
technique where 5999 observations in the original dataset have been reduced to 2321 data points. In
the classification tree, the root had the same attribute as the ‘number of reworks’ with the predicate
value changed to 20.5 compared to the original classification tree. In the next level ‘daily production’
was in the decision nodes. In later splitting ‘types of reworks’, ‘garments’ and ‘operation activities’
were used in the decision tree as decision nodes. An important indication is that, the input variable,
‘team’ was not in the built decision tree as a node. The model accuracy was evaluated using the
following confusion matrix:
210 0
𝐶= [ ] (9)
11 243
The class imbalance problem has been reduced in the above confusion matrix compared to the
confusion matrix in Equation 7. TP and TN values were 210 and 243 respectively. Table 3 exhibits
that, the FP rate was zero while precision was one. But 11 values predicted as the class high which
originally belongs to the class low. However, there were no values that predict as the class low which
belongs to the class high.

Table 3: Performance measurements of the classification tree after applying the under-
sampling technique
Performance measurement Rate
FP rate 0
TP rate 0.9502
Precision 1
Recall 0.9502
Accuracy 0.9763
F measure 0.9745

Total positive rate, recall, accuracy, F measure values were close to one while the FP rate was zero.
Probabilistic Neural Network (PNN)
In the PNN approach, the most suitable model was selected by adjusting the spread parameter from
zero to one and the performance of PNN models with different values of spread parameter are
displayed in Table 4.

665
International Conference on Business and Information (ICBI) 2020
Faculty of Commerce and Management Studies, University of Kelaniya, Sri Lanka
Electronic copy available at: https://ssrn.com/abstract=3862467
ISSN 2465-6399

Table 4: Performance of PNN with different spread values


Spread TP TN FP FN
0.1 989 9 202 0
0.5 969 134 77 20
0.7 967 159 52 22
0.8 966 162 49 23
0.85 966 165 46 23
0.9 966 165 46 21
1 965 167 44 24

With the increase in spread value, TP values were increased but reduced when the spread value is
one. When increasing the spread values, FN increased and FP decreased. However, FN reduced at
the spread value of 0.9. Therefore, the most suitable PNN model had 0.9 spread value that can be
used in prediction as it consists of high TP and TN. Figure 3 illustrates the PNN model.

Fig 3: PNN model


The input layer contains neurons for each independent variable. First hidden layer, the pattern layer
consists of 4799 hidden neurons which were equal to the number of observations in the training set.
The second hidden layer, the summation layer consists of two hidden neurons with the weighted
values relevant to the pattern neuron’s category. The largest vote was given as the target category in
the output layer.
PNN with under-sampling technique
To overcome the class imbalance problem, the under-sampling mechanism was applied to reduce the
data with daily FTT state. The same process was carried out to select the best spread parameter as in
Table 4 and the results are illustrated in Table 5.

Table 5: Performance of PNN with different spread values after applying the under-sampling
technique

TP TN FP FN
0.1 233 18 213 18
0.5 218 198 33 15
0.7 215 213 18 18
0.75 215 214 17 18
0.8 215 216 15 18

666
International Conference on Business and Information (ICBI) 2020
Faculty of Commerce and Management Studies, University of Kelaniya, Sri Lanka
Electronic copy available at: https://ssrn.com/abstract=3862467
ISSN 2465-6399

0.85 215 216 15 18


0.9 215 216 15 18
1 214 219 12 19

When comparing spread values FP and FN were lower in the spread value of 0.8 or 0.9 with high TP
and TN values. Therefore, the better performed model had a spread value between 0.8 and 0.9 that
can be used in prediction. Therefore, selected model with the spread value 0.8 that (small predictive
is more selective) used in prediction. Figure 4 exhibits the PNN model after the application of the
under-sampling technique to the original data. First hidden layer reduced to 1857 hidden neurons
which were equal to the number of inputs in the training set. The summation hidden layer consisted
of two hidden neurons that were used to store the actual target category with the weighted vote for
each state to the concerning observations.

Fig 4: PNN model after application of the under-sampling technique

Table 6 illustrates the FP rate, TP rate, precision, recall, accuracy and F - measure before and after
applying the under-sampling technique to PNN models.
Table 6: Performance measures of PNN models before and after applying the under-sampling
technique
Before under-sampling After under-sampling
FP rate 0.2180 0.0649
TP rate 0.9767 0.9227
Precision 0.9545 0.9348
Recall 0.9767 0.9227
Accuracy 0.9425 0.9291
F measure 0.9655 0.9287

FP rate had decreased with the application of the under-sampling method from 0.218 to 0.0649. The
TP rate, precision, recall, accuracy and F – measure were low in original data compared to data
consider in the under-sampling technique.

Discussion and Recommendation


This work was carried out to predict the FTT states in an apparel garment with the data mining
mechanisms. It was indicated from the cross-tabulation between the state and damage type, the high–
low affected for the state high with the highest frequency. In the manufacturing process, the high-low
damage type occurred due to the unbalanced sawing operations in the panels. This may be due to the
667
International Conference on Business and Information (ICBI) 2020
Faculty of Commerce and Management Studies, University of Kelaniya, Sri Lanka
Electronic copy available at: https://ssrn.com/abstract=3862467
ISSN 2465-6399

improper training of the machine operator, failure in the machine, or due to the nature of the fabric.
Therefore, special attention should be given to reduce the defect. In addition, measurement out was
the damage type with the highest frequency in the low state. Dimensional matters are due to the
incorrect patterns, cutting problems, incorrect sawing by the machine operator. The entire quantity
was affected due to this damage type and it results in a production loss with a waste of time. If the
correct measurements are properly given by the technicians to the relevant machine operators with
an applicable measurement document, it can affect the quality positively.
There was no effect from the variable, ‘team’ for the daily FTT state. It was conspicuous the process
in a team was covered through all other independent variables. The variables, the ‘number of reworks’
and ‘daily production’ directly affected in the classification trees. By Equation 1 it was articulated,
as the FTT state was calculated with those variables.
Table 7: Performance measures of PNN models and classification trees before and after
applying the under-sampling technique
Classification trees PNN
Before under- After under- Before under- After under-
sampling sampling sampling sampling
FP rate 0.0463 0 0.2180 0.0649
TP rate 0.9949 0.9502 0.9767 0.9227
Precision 0.9899 1 0.9545 0.9348
Recall 0.9949 0.9502 0.9767 0.9227
Accuracy 0.9875 0.9763 0.9425 0.9291
F-measure 0.9925 0.9745 0.9655 0.9287

Comparing the models in Table 7, the FP rate was zero in the classification tree with the under-
sampling technique and it had the highest precision of one. Accuracy, F measure and recall were
close to one in this tree. FP rate was high before applying under-sampling in the PNN model.
However, due to the class imbalance problem, when comparing the models after under-sampling,
classification tree performance measurements are better than PNN model performance. Therefore,
the better model to predict the state of quality of the garments is the classification tree with the under-
sampling technique than the PNN model. However, the models discussed with the under-sampling
method can be used in prediction due to high accuracy, recall and F-measure. Even though the
classification tree and PNN models with class imbalance problem had high accuracy, the new data
can predict the wrong state of daily FTT.
Different studies had been done based on quality and production efficiency in the apparel field as
discussed in the literature review section. However, still very few researches had been conducted
about the FPY (Mohan et al., 2012). This study will add more value to existing knowledge. There are
different quality measuring tools used in each research. Some of them are control charts, Pareto
analysis, fishbone diagrams which are traditional statistics techniques. In this study, different data
mining classification techniques were applied and model performances were compared.
The expeditious advancement of the country’s textile and garment industry has returned in Sri Lanka
converting into Centre of the areal apparel. The major acquisition of industrialization is the increase
in productivity. The result is the production of a huge manifold of products and services. This in
return provides advanced standards of living for the whole society and the economy. The prediction
of the quality state is a fascinating frame in the apparel industry. Indeed, it is very important to
identify the quality state before assigning a process to a set of machine operators. An appropriate
forecasting system is the best method to quantify quality. For example, if need to assign a style to a
team can predict which team and group of people can sew the whole garment with high quality or the
high state by assigning input variables to the above models. This is a saving to the garment as the
number of reworks and issues arise in the future can be reduced before the work. Further to recognize
668
International Conference on Business and Information (ICBI) 2020
Faculty of Commerce and Management Studies, University of Kelaniya, Sri Lanka
Electronic copy available at: https://ssrn.com/abstract=3862467
ISSN 2465-6399

the training requirements of the employees as distinctive attention can be given to the teams with the
low state category. A vital factor in the apparel industry is customer satisfaction. Through the high
quality of the products can secure loyal customers, earn a high profit and can compete with the
competitors.
Considering the limitations of this study, the data were not available for whole styles in the factory,
daily FTT might not well represent the overall quality in the factory. Furthermore, there can be some
other influential factors for daily FTT state which were not considered under the study. As an
example, machine condition, maintenance and effect from departments like mechanical, industrial
engineering, planning and cutting. In addition, data collection was done with the Endline Rework
Report maintained by the quality checker. The respective daily FTT depends on the eye inspection
of each quality checker.
For further improvements of the study the Naïve Bayes, Random Forest and Bayesian Forest methods
can be applied (Jain and Kumar (2020)) and can compare the accuracy with the built classification
trees and PNN models. If the study could be done to every brand within the garment with an additional
long time period, it would provide many precise results on the study. At the same time, consideration
of extra factors (Rahman and Amin, 2016) like Standard Minute value (time taken by workers to
finish the garment) (Rahman et al., 2014), hourly targets and daily targets will improve the
performance of the built models.

References
Ahmadlou, M., & Adeli, H. (2010). Enhanced probabilistic neural network with local decision
circles: A robust classifier. Integrated Computer-Aided Engineering, 17(3), 197-210.
Aggarwal, C. C. (2015). Data mining: the textbook. Springer.
Choudhary, A. K., Sikka, M. P., & Bansal, P. (2018). The study of sewing damages and defects in
garments. Research Journal of Textile and Apparel, 22(2), 109-125.
Drummond, C., & Holte, R. C. (2003, August). C4. 5, class imbalance, and cost sensitivity: why
under-sampling beats over-sampling. In Workshop on learning from imbalanced datasets II (Vol.
11, pp. 1-8). Washington, DC: Citeseer.
Goebel, K., Wood, B., Agogino, A., & Jain, P. (1994, March). Comparing a neural-fuzzy scheme
with a probabilistic neural network for applications to monitoring and diagnostics in
manufacturing systems. In Spring symposium series of AAAI (pp. 45-50)
Grabmeier, J. L., & Lambe, L. A. (2007). Decision trees for binary classification variables grow
equally with the Gini impurity measure and Pearson's chi-square test. International journal of
business intelligence and data mining, 2(2), 213-226.
Hui, P. C., Chan, K. C., Yeung, K. W., & Ng, F. S. (2007). Application of artificial neural networks
to the prediction of sewing performance of fabrics. International Journal of Clothing Science and
Technology.
Hsu, C. H., & Wang, M. J. J. (2005). Using decision tree-based data mining to establish a sizing
system for the manufacture of garments. The International Journal of Advanced Manufacturing
Technology, 26(5-6), 669-674.
Jain, S., & Kumar, V. (2020). Garment Categorization Using Data Mining
Techniques. Symmetry, 12(6), 984.
Jerez-Aragonés, J. M., Gómez-Ruiz, J. A., Ramos-Jiménez, G., Muñoz-Pérez, J., & Alba-Conejo, E.
(2003). A combined neural network and decision trees model for prognosis of breast cancer
relapse. Artificial intelligence in medicine, 27(1), 45-63.

669
International Conference on Business and Information (ICBI) 2020
Faculty of Commerce and Management Studies, University of Kelaniya, Sri Lanka
Electronic copy available at: https://ssrn.com/abstract=3862467
ISSN 2465-6399

Kurniati, N., Yeh, R. H., & Lin, J. J. (2015). Quality inspection and maintenance: the framework of
interaction. Procedia manufacturing, 4, 244-251.
Loh, W. Y. (2011). Classification and regression trees. Wiley Interdisciplinary Reviews: Data Mining
and Knowledge Discovery, 1(1), 14-23.
Mohan, R. R., Thiruppathi, K., Venkatraman, R., & Raghuraman, S. (2012). Quality Improvement
through First Pass Yield using Statistical Process Control Approach. Journal of Applied
Sciences(Faisalabad), 12(10), 985-991.
Rahman, H., Roy, P. K., Karim, R., & Biswas, P. K. (2014). Effective Way to Estimate the Standard
Minute Value (SMV) Of A T-Shirt By Work Study. European Scientific Journal, 10(30).
Rahman, M. H., & Al Amin, M. (2016). An empirical analysis of the effective factors of the
production efficiency in the garments sector of Bangladesh. European Journal of Advances in
Engineering and Technology, 3(3), 30-36.
Sizwe, M. M., & Charles, M. (2017). Quality Control in the Clothing Production Process of an Under-
Resourced Sewing Co-operative: Case Study. In Proceedings of the 2017 International
Symposium on Industrial Engineering and Operations Management (IEOM) (pp. 698-707).
Uddin, S. M., Hasan, R., & Hosen, S. (2014, December). Defects minimization through DMAIC
methodology of Six Sigma. In International Conference on Mechanical, Industrial and Energy
Engineering, Khulna-Bangladesh.
Xing, Y., Wang, Z., Wang, J., Kan, Y., Zhang, N., & Shi, X. (2019). Human Body Shape Clustering
for Apparel Industry Using PCA-Based Probabilistic Neural Network. In Advances in Computer
Communication and Computational Sciences (pp. 343-354). Springer, Singapore.

670
International Conference on Business and Information (ICBI) 2020
Faculty of Commerce and Management Studies, University of Kelaniya, Sri Lanka
Electronic copy available at: https://ssrn.com/abstract=3862467
View publication stats

You might also like