You are on page 1of 6

Project Report

Group No. 6
Ajay Pratap Singh Tomar 1801005
Md Danish Zafar 1801101
Sanket Somkuwar 1801173
Shubham Gupta 1801178

Overview
This is a Women’s Clothing E-Commerce dataset revolving around the reviews written by
customers. Its nine supportive features offer a great environment to parse out the text through its
multiple dimensions. Because this is real commercial data, it has been anonymized, and
references to the company in the review text and body have been replaced with “retailer”.

Business objective

Organizations today encounter textual data (both semi-structured and unstructured) while
running their day to day business. This data may be accessible but remains untapped due to the
lack of awareness of the information wealth an organization possesses or the lack of
methodology or technology to analyze this data and get the useful insight.
This dataset contains reviews about the products available on a women e-commerce platform.
The reviews may be about size, fitting, colour and other aspects of clothes. By analysing, this we
can get an idea about what customers are talking about and what they feel about the product.

The data about customers could give them insight about how to provide better services to its
customers and increase their customer base.

Mining/analytics objectives

Our objective was to analyse the data based on the words used. We formed a word cloud
representing frequently used words. We can gain insights about whether customers are
talking positively or negatively about us.
Specific questions that you seek to answer using analytics techniques

1. What customers are talking about us?

 As we can see, customers are using specific words like ‘Love’, ‘Like’. We can
comfortably say that customers are happy about the product.
 Also, we can find that ‘Size’ and ‘Fit’ are mentioned. This would mean customers
are most affected by size factor while deciding to buy the product.
Recommendation for the platform is to use detailed size chart which can help
customers deciding their fit easily. Also, provide a standard size chart as products
across different brands vary in size and simply using labels like M, X, XL etc. can
be confusing.

2. What are the important factors in deciding a cloth when buying from online?

 Size fit remains the most important factor followed by colour and fabric
 While not much can be done about fabric, we can give customers as close a
look into fabric by taking close up shots of product.
 For colour factor, images of product should be taken in a way as to represent
the true colour of the product by using proper lights.

Description of the data set – source, no. of records, fields etc.


This dataset includes 23486 rows and 10 feature variables. Each row corresponds to a customer
review, and includes the variables:

 Clothing ID: Integer Categorical variable that refers to the specific piece being reviewed.
 Age: Positive Integer variable of the reviewers age.
 Title: String variable for the title of the review.
 Review Text: String variable for the review body.
 Rating: Positive Ordinal Integer variable for the product score granted by the customer from 1
Worst, to 5 Best.
 Recommended IND: Binary variable stating where the customer recommends the product where
1 is recommended, 0 is not recommended.
 Positive Feedback Count: Positive Integer documenting the number of other customers who
found this review positive.
 Division Name: Categorical name of the product high level division.
 Department Name: Categorical name of the product department name.
 Class Name: Categorical name of the product class name.

Source: https://www.kaggle.com/vsridhar7/customer-review-analysis-text-mining/data
Clothing Review Recommended Positive Feedback Division Department Class
ID Age Title Text Rating IND Count Name Name Name
1077 60 Some major I had
design
suchflaws
high hopes3 for this dress and
0 really wanted it0toGeneral
work for me. Dresses
i initially ordered
Dresses the petite small (my u
1049 50 My favoriteI love,
buy! love, love this
5 jumpsuit. it's fun,
1 flirty, and fabulous!
0 General
every Petite
time
Bottoms
i wear it, i get
Pants
nothing but great complim
847 47 Flattering shirt
This shirt is very flattering
5 to all due 1to the adjustable front
6 General
tie. it is the
Topsperfect lengthBlouses
to wear with leggings and
1080 49 Not for theI very
love petite
tracy reese dresses,
2 but this one
0 is not for the very
4 General
petite. i amDresses
just under 5 feet
Dresses
tall and usually wear a 0p

What is your analytical methodology? Which techniques do you plan to use? Why?

Text Analysis

Text analysis is tool which we used to get the word cloud depicted above.

For this, we used R studio and installed packages: tidyverse, ggplot2, ggthemes

Rscript

setwd("G:/BADM Project/Womens Clothing E-Commerce Reviews.csv")


review <- read.csv("G:/BADM Project",stringsAsFactors=F)
library(tidyverse)
library(qdap)
library(dplyr)
library(tm)
library(wordcloud)
library(plotrix)
library(dendextend)
library(ggplot2)
library(ggthemes)
library(RWeka)
library(reshape2)
library(quanteda)
setwd("G:/BADM Project/Womens Clothing E-Commerce Reviews.csv")
review <- read.csv("G:/BADM Project", stringsAsFactors = F)
review = read.csv("Womens Clothing E-Commerce Review")

Sentiment analysis

For sentiment analysis, we formed corpus of positive meaning words and negative meaning words.
After that, we formed a word cloud representing positive words in green and negative ones in red
Rscript

pos_comments <- subset(review$Review.Text , review$Recommended.IND==1)


neg_comments <- subset(review$Review.Text , review$Recommended.IND==0)

#paste and collapse positive and negative comments

pos_terms <- paste(pos_comments , collapse =" ")


neg_terms <- paste(neg_comments , collapse =" ")
#Combine both positive and negative terms

all_terms <- c(pos_terms, neg_terms)


#Creating corpus for all the terms

all_corpus <- VCorpus(VectorSource(all_terms))


In [24]:
all_tdm <- TermDocumentMatrix(
# Use all_corpus
all_corpus,
control = list(
# Use TFIDF weighting
#weighting = weightTfIdf,
# Remove the punctuation
removePunctuation = TRUE,
#Remove numbers
removeNumbers =TRUE,
#Stemming of Documents
stemDocument = TRUE,
#Convert to lowercase
tolower = TRUE ,
# Use English stopwords
stopwords = stopwords("english")
all_tdm_m <- as.matrix(all_tdm)

colnames(all_tdm_m) <- c("positive","negative")


all_term_freq <- rowSums(all_tdm_m)
all_term_freq <- sort(all_term_freq,TRUE)
all_term_freq[1:20]

comparison.cloud(
all_tdm_m,
max.words = 100,
colors = c("darkgreen", "darkred"))

Simple word clustering

Word clustering is used to identify word groups used together, based on frequency distance. This
is a dimension reduction technique. It helps in grouping words into related clusters. Word clusters
are visualized with dendrograms
Rscript

review_tdm2 <- removeSparseTerms(review_tdm, sparse=0.9)


hc <- hclust(d = dist(review_tdm2, method = "euclidean"), method = "complete")

# Plot a dendrogram
plot(hc)
How will results from your analytics plan help you solve your Business Problem?

If you are running an e-commerce website then it can be a huge challenge to discover what is being
said about your site, your brand or the products you sell. Even harder is to sort and make sense of
the mountain of feedback that exists online. Harder still is the ability to draw upon granular levels of
feedback about specific product aspects within the data.

Online retailers should be looking at sentiment analysis to solve this challenge.


Sentiment analysis is the process of determining the emotional tone behind a series of words, used
to gain an understanding of the the attitudes, opinions and emotions expressed within an online
mention.

Sentiment analysis allow retailers and brands alike to understand the opinions of consumer
feedback and User Generated Content. Is it negative or positive? What is the context of their
opinion? Are they talking about the product or just a feature within the product?

Sentiment analysis allow retailers and brands alike to understand the opinions of consumer
feedback and User Generated Content. Is it negative or positive? What is the context of their
opinion? Are they talking about the product or just a feature within the product?

 Here are the key benefits for online retailers when using sentiment analysis:

 Process large amounts of online opinions automatically from numerous sources


 Display UGC on your retail website showing actionable insights for shoppers
 Provide a deeper level of understanding than just star ratings
 Increase the number of opinions that can be collected by analyzing and extracting multiple
opinions for each feedback source

You might also like