You are on page 1of 4

Prediction on Choosing Clothes by Review on E-

Commerce Site Using Machine Learning


Abstract—Nowadays, People around the world depend on the This recommendation system is based on Fuzzy logic.
internet. They use the internet in every part of their life. Roaming This is mainly app-based research. They used the pictures to
in physical shops or malls, people prefer to do shopping online. review the cloth matching according to clients' choices.[5]
Food to Shoe everything is available online. Doing online
shopping is a wise idea because it saves time and money. Another User-based-collaborative Filtering has been used in this paper.
benefit of online shopping is that anyone can buy anything at any This algorithm helps to publish the negative thing about any
time, Night and day don't matter. But it is not possible to do offline top-rated items. The final result showed that this algorithm is
shopping. There are a lot of risks also such as fraudulent pages, capable to increases the coverage of recommendations [6]
websites that can steal money, sometimes cannot decide the real
website and major problems finding the good quality clothes III. PROPOSED METHOD
online and so on. Reading reviews on e-commerce sites can help The flow diagram of the proposed technic is shown in Fig
women to find their desirable clothes. We considered the Dataset 1. Here, At first, we take the dataset and then check the
on Women’s Clothing E-Commerce to do our research. Four missing values of this dataset. After that, we need to clean the
Supervised Algorithms such as K-nearest Neighbors, Random
dataset if there are any missing values. After cleaning the
Forest, Stochastic Gradients, and Decision Tree are used here for
dataset, we rename the data frame which is Columns. Now we
prediction choosing clothes. Stochastic Gradient Descent is the
best performer with 89% accuracy. analyze the data frame. Then processing the text (reviews)
using NLP, which is shown in fig 2. This processing is done
Keywords—machine learning, supervised classifiers, Client in some steps. the first step is tokenization. This step is very
review data, KNN, SGD, LR, NB. important because in this step long string converts into a token
and is placed into the list. Then the second step is removing
I. INTRODUCTION stop words. Using this can help to remove the noise of our
Fashion is a common way of demonstrating one's aesthetic dataset during this step. This helps to get to know the needy
preferences. So it is something which is in trend. Outfit, shoes, word that earns which words matter. Stemming is the third
jewelry, beauty, hairdo, style of life, and hourglass shapes are step of NLP and this removes the noise of the dataset and
all hallmarks of fashion. Nowadays. Outfits are the most reduces the number of any different words. Now we are going
signification fashions compared to the others. People tend to to rejoin the last list of the token. After the rejoined we going
buy outfits on online sites. In this era, Online Shopping is far to pass this to the final step which is vectorization. Then it's
better than offline Shopping. People depend on various online time to split the dataset. Split the dataset into two parts which
shopping websites because they don't want to waste their time are train and test. The proportion of test size is 30,70 percent
in malls[1]. For this reason, the Review of any website plays is included in the training part. Here training part is border
an important role Especially Clothing. Any online shopping than the test while it implies a more effective predicted
maniac can gain a better grasp of websites and choose a method. After the split, we have done vectorization. Then
perfect website for shopping for clothes by having an estimate trained the data using those classifiers which we stated as
on Clothing Review[2]. In our research, we have considered a before, we predict on test data and evaluate the model. Finally,
Women’s Clothing E-Commerce dataset based on customer the performance Analysis of four supervised classifiers is
testimonials. We train our model by this dataset for predicting evaluated.
any clothes based on data that we are used to training. A
prediction will indeed be constructed utilizing various
classifier machine learning techniques, and a comparison
study will be conducted to see which algorithm performs best
on these agreements. K-nearest neighbors, Random Forest,
Stochastic Gradient Descent, and Decision Tree are used to
train.

II. PREVIOUS WORKS


We're proposing a classifier here, so let's take a look at
how procedure explanations can be used to predict the result
of certain circumstances.
On an online site, Product sale is a very important and the
most popular thing. Nowadays. It has some challenges
because of customer expectations. This research builds a
prediction system that can predict the exact price of online
sales Multiple-linear regression machine algorithms had been
used here.[3] Decision Tree, Random Forest, Extremely
Randomized Tree, K-nearest Neighbor, Adaboost, Logistic
Regression, and Stochastic Gradient Descent these six
classifiers are used to give the best decision according to the
question-answer about fashion.[4]

XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE


Age Structure: If we look at this figure no 3. Most of the
clients in the dataset are middle-aged women. So 40 years old
women are the main customers of this site
From Figure 4, we can see that most of the clients are given
five ratings for each product on this site. That means. This
website ensures good quality clothes, so anyone can buy any
product from this website without any hesitations.

Fig. 3. Age Structure.

Fig. 1. The methodology.

Fig. 4 Five ratings of each product

If we compare the clothes among them, we see that the


three-layering class is at the top. This is represented in fig 5.
So if anyone searches the online site to buy 3-layering clothes,
choosing this website will be a wise idea.

. Fig. 2. Preprocessing Steps.


A. Dataset Description
People nowadays prefer online shopping. They buy their
things online on various platforms like Facebook pages, apps,
websites, etc. They select these online platforms by watching
the reviews.People, Specially women love to do shopping, but
it is not possible always to go to the mall. Sometimes time is
limited for shopping, here online shopping is the best option.
Women who want to buy clothes online, choose the product
and platform by reading or watching reviews. Here, Review is Fig. 5. Top class (3-layering class) compared to other class
known as a money saver and it helps to know about the fraud, From fig 6, we can see that those who are older than 40 are
fake pages, websites, etc. Here, We took the Women’s more likely to buy sweaters whereas casual bottoms are more
Clothing E-Commerce dataset based on customer attractive to twenties.
testimonials. This dataset contains some ID, various ratings,
Different types of reviews, and several titles of 23486 clients
where 10 features are accumulated.
i. Data Exploration
Previously we stated that 10 features are used here. Some
features are directly related and some are less related to this
prediction of choosing clothes.
Fig. 6 Cloths (class) according to age

According to the departments, Dresses, Bottoms, Tops,


Intimates, Jackets, and Trends are the most popular around age
20 to 80

Fig. 10. Recommended clothes according to department

IV. RESULT ANALYSIS


Fig. 7. Structure of positive ratings . The dataset contains Reviews of 23486 customers. Our
proposed model was considered based on the evaluation of
In fig 8, Trends is having a low rating than other clothes, performance metrics.
whereas tops are highly rated products. Buying Tops on this
site will be a great decision. A. Performances Metrics
If the real result and the predicted result both are coming
as yes, then it is known as True Positive(TP). Suppose that the
real result shows that the review of the clothes is good and the
prediction also suggests the same. So now True negative is
the opposite version of True positive Which means, here
actual and anticipated both are no. That means Negatives
reviews are negatives here. When reviews said bad which is
known as the actual value but Clothes are good that means the
predicted value said opposite from the actual which caused
False Positive (FP).[7]
False Negatives are mainly the other side coins of False
Positives where the real value is yes but predicted known as
No. After those were comprehended, we calculated the F1
score, Accuracy, Precision, and Recall. [7]
The ratio of accurately projected optimistic data to overall
significantly predicted data is regarded as precision.[11]
𝑇𝑃
Fig. 8 Top-ranked and High ranked products according to the 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = (1)
department. (𝑇𝑃+𝐹𝑃)

So now, If anyone wants to know that most recommend Accuracy is a statistic that sums up how well a
clothes according to class on this site Fig 9 helps them a lot. methodology works throughout all classes. It's determined by
This plot shows that 1 represents strongly recommended dividing the number of right estimates by the total number of
clothes which are Dresses, Pants, Blouses, Knits, and so on projections. [7]
(𝑇𝑃+𝑇𝑁)
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (𝑇𝑃+𝐹𝑃+𝐹𝑁+𝑇𝑁) (2)
The recall determines how well the system can recognize
positive cases. Evermore positive samples are revealed, the
greater the recall.[7]
𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = (𝑇𝑃+𝐹𝑁) (3)
The F1 score was created to operate effectively with unlabeled
data. The main aim of the F1 score is to integrate the precision
and recall statistics into a single data point.[7]
2∗(𝑅𝑒𝑐𝑎𝑙𝑙∗𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)
Fig. 9. Recommended clothes according to class 𝐹1 𝑆𝑐𝑜𝑟𝑒 = 𝑅𝑒𝑐𝑎𝑙𝑙+𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
(4)
According to Departments Fig 10 shows the most and least
recommended Clothes which helps to find customers to
choose their clothes B. Performances Metrics
In table 1. The output From the confusion matrix value is
shown which are known as True positive, False Positive, False
Negative, and True negative. Precision, Recall evaluated from
those confusion values. We see from table 2 that SGD has a V.CONCLUSION AND FUTURE WORK
higher rate of precision values in 1 state which is positive In this research, compare and contrast four different
reviews and DT, KNN RF followed by. But the Higher classifier algorithms which are mentioned above. Stochastic
precision value of negative reviews is shown in Rf classifiers Gradient Descent has the best performance compared to
and followed by SGD and DT. So now we see that recall value others. This research has been done by the reviews of 23486
is low with a high precision value of negative reviews state customers who visited and bought some clothes from this
shown in RF then followed by SGD.KNN, DT. So for this, we online site. In the future, the dataset of any type of jewelry
can feel so confused that which more desired result. F1 score and other female products and some other classifiers will be
can help us to get out of our confused situation. F1 score considered.
mainly presents the stability of precision and recall. A perfect
f1 score mainly refers that a low false-positive and low false- REFERENCES
negative value that denotes the exact evaluation. From table 2
we can see that according to the f1 score, SGD is the best
[1] https://www.toppr.com/guides/essays/essay-on-fashion/
performer in both situations. Then RF. KNN, DT has the
[2] https://www.123helpme.com/essay/Essay-About-Wearing-Clothes-
poorest performance. SGD has the highest accuracy compared 435350
to others which are 89%. [3] S. Chavan, S. Panchal , T. Sawant , J.Shinde, 2020, “Predicting Online
Classifier TP FP FN TN Product Sales using Machine Learning”, INTERNATIONAL
SGD 617 455 202 4625 JOURNAL OF ENGINEERING RESEARCH & TECHNOLOGY
RF 267 805 55 4772 (IJERT) Volume 09, Issue 04 (April 2020)
KNN 325 747 164 4663 [4] D.Mona., “Predicting fashion using machine learning
DT 450 620 578 4249 techniques”,2017
[5] A.Pandit, K.Goel, M,Jain and N,Katre 2020. “A Review on Clothes
Table. 1 TP, FP, FN, TN table for various algorithms Matching and Recommendation Systems based on User Attributes”,
International Journal of Engineering Research & Technology, 9(8).
Class = 0 Class = 1
Classifie [6] Y.Liu ,J.Nie,L. Xu,Y. Chen, and B,Xu. 2017, September. “Clothing
Ac
r precisio recal f1- precisio recal f1- recommendation system based on advanced user-based collaborative
n l score n l score filtering algorithm”,In International Conference on Signal and
SGD 0.76 0.58 0.65 0.91 0.96 0.93 0.89 Information Processing, Networking and Computers (pp. 436-443).
RF 0.83 0.24 0.39 0.86 0.99 0.92 0.85 Springer, Singapore
KNN 0.66 0.30 0.42 0.86 0.97 0.91 0.85
[7] https://towardsdatascience.com/the-f1-score-bec2bbc38aa6
DT 0.45 0.43 0.44 0.75 0.88 0.88 0.80
Table. 2. Precision, Recall F1-score, and Accuracy for various algorithms.

You might also like