Professional Documents
Culture Documents
Project Report
Project Report
Group No. 6
Ajay Pratap Singh Tomar 1801005
Md Danish Zafar 1801101
Sanket Somkuwar 1801173
Shubham Gupta 1801178
Overview
This is a Women’s Clothing E-Commerce dataset revolving around the reviews written by
customers. Its nine supportive features offer a great environment to parse out the text through its
multiple dimensions. Because this is real commercial data, it has been anonymized, and
references to the company in the review text and body have been replaced with “retailer”.
Business objective
Organizations today encounter textual data (both semi-structured and unstructured) while
running their day to day business. This data may be accessible but remains untapped due to the
lack of awareness of the information wealth an organization possesses or the lack of
methodology or technology to analyze this data and get the useful insight.
This dataset contains reviews about the products available on a women e-commerce platform.
The reviews may be about size, fitting, colour and other aspects of clothes. By analysing, this we
can get an idea about what customers are talking about and what they feel about the product.
The data about customers could give them insight about how to provide better services to its
customers and increase their customer base.
Mining/analytics objectives
Our objective was to analyse the data based on the words used. We formed a word cloud
representing frequently used words. We can gain insights about whether customers are
talking positively or negatively about us.
Specific questions that you seek to answer using analytics techniques
As we can see, customers are using specific words like ‘Love’, ‘Like’. We can
comfortably say that customers are happy about the product.
Also, we can find that ‘Size’ and ‘Fit’ are mentioned. This would mean customers
are most affected by size factor while deciding to buy the product.
Recommendation for the platform is to use detailed size chart which can help
customers deciding their fit easily. Also, provide a standard size chart as products
across different brands vary in size and simply using labels like M, X, XL etc. can
be confusing.
2. What are the important factors in deciding a cloth when buying from online?
Size fit remains the most important factor followed by colour and fabric
While not much can be done about fabric, we can give customers as close a
look into fabric by taking close up shots of product.
For colour factor, images of product should be taken in a way as to represent
the true colour of the product by using proper lights.
Clothing ID: Integer Categorical variable that refers to the specific piece being reviewed.
Age: Positive Integer variable of the reviewers age.
Title: String variable for the title of the review.
Review Text: String variable for the review body.
Rating: Positive Ordinal Integer variable for the product score granted by the customer from 1
Worst, to 5 Best.
Recommended IND: Binary variable stating where the customer recommends the product where
1 is recommended, 0 is not recommended.
Positive Feedback Count: Positive Integer documenting the number of other customers who
found this review positive.
Division Name: Categorical name of the product high level division.
Department Name: Categorical name of the product department name.
Class Name: Categorical name of the product class name.
Source: https://www.kaggle.com/vsridhar7/customer-review-analysis-text-mining/data
Clothing Review Recommended Positive Feedback Division Department Class
ID Age Title Text Rating IND Count Name Name Name
1077 60 Some major I had
design
suchflaws
high hopes3 for this dress and
0 really wanted it0toGeneral
work for me. Dresses
i initially ordered
Dresses the petite small (my u
1049 50 My favoriteI love,
buy! love, love this
5 jumpsuit. it's fun,
1 flirty, and fabulous!
0 General
every Petite
time
Bottoms
i wear it, i get
Pants
nothing but great complim
847 47 Flattering shirt
This shirt is very flattering
5 to all due 1to the adjustable front
6 General
tie. it is the
Topsperfect lengthBlouses
to wear with leggings and
1080 49 Not for theI very
love petite
tracy reese dresses,
2 but this one
0 is not for the very
4 General
petite. i amDresses
just under 5 feet
Dresses
tall and usually wear a 0p
What is your analytical methodology? Which techniques do you plan to use? Why?
Text Analysis
Text analysis is tool which we used to get the word cloud depicted above.
For this, we used R studio and installed packages: tidyverse, ggplot2, ggthemes
Rscript
Sentiment analysis
For sentiment analysis, we formed corpus of positive meaning words and negative meaning words.
After that, we formed a word cloud representing positive words in green and negative ones in red
Rscript
comparison.cloud(
all_tdm_m,
max.words = 100,
colors = c("darkgreen", "darkred"))
Word clustering is used to identify word groups used together, based on frequency distance. This
is a dimension reduction technique. It helps in grouping words into related clusters. Word clusters
are visualized with dendrograms
Rscript
# Plot a dendrogram
plot(hc)
How will results from your analytics plan help you solve your Business Problem?
If you are running an e-commerce website then it can be a huge challenge to discover what is being
said about your site, your brand or the products you sell. Even harder is to sort and make sense of
the mountain of feedback that exists online. Harder still is the ability to draw upon granular levels of
feedback about specific product aspects within the data.
Sentiment analysis allow retailers and brands alike to understand the opinions of consumer
feedback and User Generated Content. Is it negative or positive? What is the context of their
opinion? Are they talking about the product or just a feature within the product?
Sentiment analysis allow retailers and brands alike to understand the opinions of consumer
feedback and User Generated Content. Is it negative or positive? What is the context of their
opinion? Are they talking about the product or just a feature within the product?
Here are the key benefits for online retailers when using sentiment analysis: