You are on page 1of 6

End-Term Examination

(PGP 2019 – 21), Term - VI


Course Name: Social Media and Web Analytics

Name:
Regn. No.:
Total marks: 40 marks Time: 1.5 hour

Any Other
Open Book Open Laptop Internet Access Calculator
Information

Yes No Yes No Yes No Yes No

       
Instructions: 1. There are three Roman Numbered sections. A Participant is needed to answer all
questions in each of the sections. Please write to-the point answers.
I. Answer all questions (1 M*10=10 M)

1. To handle imbalance dataset problem, we use a method wherein we randomly select


samples from each of the minority class. The method is ……………………
2. In Synthetic Minority Oversampling Technique (SMOTE), we ………. classes by creating
……… examples
3. Random forest tries to build multiple …………..models with different ………. and
different initial ……….
4. The formula of F1 in a confusion matrix is equal to
5. YouTube Analytics, provides access to vital metrics about video performance including
……., ……………, ……………, and …………..
6. The seven most essential metrics in YouTube Analytics are: …………………..
7. “Lifetime People who have liked a Page and Engaged with a Post” in Facebook Page
insights refers to ………………………………………………………..
8. A Topic Model can be defined as an ………. technique to discover ……… across various
………………….
9. LDA is a ………………… model that assumes each …………….. is a mixture over an
underlying ………………., and each document is a mixture of over a set of …………….
10. Markov chains is a process which …….. the movement and gives a …….., for moving from
one …… to another …………..
II. Case 1 (5 M*3=15 M)
Sahini Dresses is a distributor for ethnic festival dresses and they have pan India distribution. The
organization also has listed itself as a vendor in e-commerce portals like Flipkart, Myntra and Amazon.
Buyers in e-commerce portal has options to write and post textual reviews on different products they
purchase. The manager of Sahini Dresses is interested to explore which topics are normally discussed by
the customers of the festival dresses for Diwali. The manager has performed a web scrapping program in
Python and retrieved all the reviews posted in the product page of “Saree“ for Diwali festival. The
products which are listed below INR 2000 was only considered in this analysis. There are many reviews
in Hinglish and broken English and serious volume of text pre-processing was performed.

a. After the web scrapping, the textual data is preprocessed and the following visualizations (Figures 1, 2
and 3) are obtained. You are asked to generate actionable insights from this visualization obtained from
textual data analysis of online reviews posted by consumers.

Fig 1. Word Cloud for Negative reviews for Sarees purchased during Diwali period

Fig 2. Emotional word cloud for reviews on Sarees purchased during Diwali period
Fig 3. Relative emotional distribution for reviews on Sarees purchased during Diwali period

b. The topic model has been constructed using LDA. The optimum number of topics was obtained to be
12. The table below represents the topic model. You are asked to label the topics based on the model
given in the table 1. Also throw highlights on how a retailer who listed the sarees can use this topic model
for managerial decision making.

Table 1. Topic model on textual reviews on Sarees purchased during Diwali period
Sl. Topic model
No.

1. '0.606*"price" + 0.047*"satisfy" + 0.027*"would" + 0.023*"fully" + '


  '0.022*"emblishe" + 0.022*"occasion" + 0.022*"birthday" + 0.022*"shop" + '
  '0.018*"line" + 0.017*"brightness"'

2. '0.459*"show" + 0.312*"image" + 0.090*"come" + 0.020*"suggest" + '


  '0.010*"silver" + 0.009*"sew" + 0.007*"difference" + 0.004*"still" + '
  '0.004*"height" + 0.002*"darker"'

3. '0.406*"expect" + 0.080*"fabricquality" + 0.044*"gift" + 0.027*"know" + '


  '0.026*"lace" + 0.024*"thre" + 0.021*"even" + 0.020*"satifie" + '
  '0.019*"approve" + 0.018*"ever"'

4. '0.296*"design" + 0.121*"order" + 0.079*"receive" + 0.070*"think" + '


  '0.069*"type" + 0.061*"use" + 0.056*"photo" + 0.016*"vintage" + '
  '0.016*"recently" + 0.013*"confidence"'

5. '0.279*"blouse" + 0.061*"piece" + 0.048*"cut" + 0.047*"exactly" + '


  '0.039*"want" + 0.032*"highly" + 0.030*"market" + 0.028*"pic" + '
  '0.026*"stain" + 0.025*"border"'

6. '0.661*"color" + 0.058*"fade" + 0.043*"texture" + 0.039*"weight" + '


  '0.023*"bite" + 0.021*"compare" + 0.015*"thing" + 0.014*"property" + '
  '0.014*"change" + 0.011*"side"'

7. '0.081*"work" + 0.072*"improve" + 0.071*"make" + 0.068*"nee" + 0.043*"tell" '


  '+ 0.042*"rate" + 0.036*"take" + 0.033*"produce" + 0.030*"shine" + '
  '0.025*"progress"'

8 '0.508*"fabric" + 0.164*"love" + 0.082*"lady" + 0.042*"value" + 0.020*"cry" '


  '+ 0.011*"awesome" + 0.011*"synthetic" + 0.010*"bomb" + 0.006*"softer" + '
  '0.005*"simply"'
9 '0.653*"site" + 0.038*"disappoint" + 0.028*"boy" + 0.016*"guy" + 0.015*"app" '
  '+ 0.015*"relative" + 0.015*"friend" + 0.012*"try" + 0.011*"deal" + '
  '0.010*"choose"'

10 '0.094*"wear" + 0.087*"purchase" + 0.042*"day" + 0.041*"recommend" + '


  '0.035*"choice" + 0.035*"meter" + 0.034*"company" + 0.034*"go" + '
  '0.034*"book" + 0.022*"feel"'

11 '0.835*"quality" + 0.071*"dt" + 0.017*"mark" + 0.010*"otherwise" + '


  '0.003*"amount" + 0.003*"prefer" + 0.003*"less" + 0.002*"brand" + '
  '0.002*"doubt" + 0.002*"better"'

12 '0.098*"delivery" + 0.097*"return" + 0.073*"time" + 0.062*"send" + '


  '0.048*"party" + 0.046*"damage" + 0.046*"buy" + 0.039*"package" + '
  '0.034*"give" + 0.029*"get"'

c. The pre-processed textual data is used to retrieve features which are further used for building a text
classification model which is based on two classes: Recommended to Purchase (Class 0) and Not-
Recommended to Purchase (Class1). Random Forest model has been employed in this case and the model
is validated with ten-fold-cross validation. The evaluation measures are model is tabulated below:

Table 2. Performance evaluation measures for the predictive model


Accuracy F1-Score Sensitivity Specificity ROC
(test set)
0.9710 0.97052 0.98567 0.96470 0.97518
The company is seeking your support in inferring the results obtained in the evaluation table. The
company also is seeking your suggestion on the appropriate use of this predictive model.
Ans Case 1 a)
Ans Case 1 b)
Ans Case 1 c)

III. Case 2 (7.5 M*2=15 M)

An IT services company has a Facebook page where they post various types of contents to engage
their target audience. Examples of Inspiration posts may be Quotes, Trivia / Amazing Facts,
Gorgeous images and personal stories of your or your clients’ triumphs. Product type of posts is about
the IT services products that the company is offering to its clientele and customer base. Action type of
posts are ones where there is call to action option available for the targeted audience. Hypothesis
testing was performed to assess whether various aspects like type of content, theme of content, time
of the day, day of the week and month of the year has impact on Total Interactions of the posted
content. The following results has been obtained from hypothesis testing
a. There are three categories content posted in the Facebook page, which are “ Action”,
“Inspiration” and “Product”. The result of the statistical testing is tabulated below. As a social
media channel in-charge of this page, suggest what actionable insights can be derive and what
managerial decisions would like to make.
Kruskal Wallis-H Result:
Hypothesis Test Summary

Null Hypothesis Test Sig.a,b


The distribution of Total Interactions is the Independent-Samples Kruskal-Wallis Test .001
same across categories of Category.

The significance level is .050

Ranks

Category categorical N Mean Rank


Action 39 78.08

Inspiration 70 112.52

Product 77 84.02

Total 186

b. There are four types of content posted in the Facebook page, which are “ Link”, “Video”, “Shared
Video” and “Video”. The result of the statistical testing is tabulated below. As a social media
channel in-charge of this page, suggest what actionable insights can be derive and what
managerial decisions would like to make.

Kruskal Wallis-H Result:

Hypothesis Test Summary

Null Hypothesis Test Sig.a,b


The distribution of Total Interactions is the Independent-Samples Kruskal-Wallis Test .001
same across categories of Type.

The significance level is .050


Ranks

Type_categorical N Mean Rank


Total Interactions Link 42 102.42

Photo 98 95.89

Shared Video 4 8.13

Video 40 82.23

Total 184

Ans Case 2 a)

Ans Case 2 b)

You might also like