You are on page 1of 32

Emerging

Business
Opportunities
Varun Singhal
Swayam Prakash Pal
• Client (Manufacturer A) is a leading Food & Beverage manufacturer.
• Client wants to understand the growth patterns of consumer
preferences (themes) and evaluate positioning of their brand across
different themes.
• Client also wants to identify the sales drivers of their products.

Problem Statement
• Three major data sources have been provided to us
 Social Media Data
 Google Search Data
 Sales Data
• Number of themes across each data source
 Social Media- 194 unique themes
 Google Search- 160 unique themes
 Sales Data- 49 unique themes

Data Preparation
• Themes present across all data sources
 Common themes – 30.
 Example- sea salt, low carb, crab, salmon etc.

• Themes preferred across social media data


 Health(passive), boar, rabbit, probiotic, pumpkin
• Themes preferred across google search
 Shrimp, honey, sugar-free, health(passive), ethical-environment
• Themes preferred across sales data
 low carb, no additives/preservatives, stroganoff, apple cinnamon, soy foods

Data Preparation
Preference-Sales Data Preferred-Search Data

7% low carb ethical - environment


14%
12% no additives/preservatives shrimp
29%
stroganoff sugar free
38%
apple cinnamon 19% honey
20% soy foods health (passive)

20%
23% 19%

Preferred-Social Data

14% health (passive)


30% boar
14% rabbit
probiotic
pumpkin
16%
26%

Top 5 preferred themes across all data sources


• Data provided is insufficient for certain themes in the google search
data and social media data
• No sparsity was found in the given data sources

Data sufficiency and sparsity


 For finding market share, first the vendor column needs to be added in sales
data source

 Merging of data is done in 3 major steps


 sales data is merged with theme_product_list data based on product_id column to get a
product_id to claim_id mapping
 Resultant data set is merged with theme_list data based on claim_id to get a claim_id to
claim_name mapping
 Final merging is done with product_manufacture_list data on product_id to append
vendors column.
 At last we obtain a merged data set with theme-wise mapping with vendor along with
sales_units_value,sales_dollars_value columns.

Merging the data sources


Data set after merging
Overall market
share of client(A)
in comparison
with other
competitors based
on sales dollars
value.
Overall market
share of client(A)
in comparison
with other
competitors based
on sales units
value.
 Potential Competitors for vendor A for some common themes based on market sales dollars value
Ssssssssssssssssssssssssssssss

For the themes hfcs_free, microwaveable and caramel there is an


increase in market share over the years. So these themes can be
considered as emerging

Emerging Themes- Social Media


For the themes honey, health(passive), garden pea, hfcs free there is an
increase in market share over the years. So these themes can be
considered as emerging

Emerging Themes- Google Search


For the themes Pollock, American southwest style, gmo free there is an
increase in market share over the years. So these themes can be
considered as emerging

Emerging Themes- Sales Data


• Assumption – For a theme if total posts increases/decreases for a particular
period then its search volume and sales units value changes
• Null Hypothesis - There is no trend between the data sources, i.e. change in
total posts would not affect change in search volume and change in sales units
for a particular theme.
• Alternative Hypothesis – There is trend between the data sources. Total posts,
search volume and sales units value for a particular theme are related with
each other.

Hypothesis Testing
Themes that reject null hypothesis Themes that accept null hypothesis

Hypothesis Validation
Soy Foods Mackerel

Hypothesis Proof
• Latency period is calculated for the themes that are common between all the 3 data sources i.e.
social media, google search and sales data
• Latency is calculated based on a weighted average approach using total posts, search volume
and sales units value
• Mean latency between social media to search data is around 41 days.
• Mean latency between search data to sales data is around 52 days.
• Analysis on the data sources show that some themes appear first on social media and then on
google search and vice-versa. Same follows between search and sales data
• This affects the latency period across different themes

Latency observed during shifting of trend


• Themes appearing first in social media • Themes appearing first in search data
then in search data then in social media

Social-Search search-social
250 236 400 375
349
350
200
300
159 246
150 143 138 250 225 214
200 166 178 166
101 148 160
100 150 130
68 90 88
Days

100 83
50 42 46 47 46
50 38 24 20
0 0
c y r s r s c l c
s lth g al
t el na um ifi ree ian rr ga ue es are en ge ve ti rab on arb la rie ch ein ti
fo
od a ag
in s ker tu di pec o f tar ebe su bisq rtn rep hick bur ati ebio c alm w c ha calo pea rot exo
he ck se
a ac o t s gm ege blu low ch /ale o-p c am ser pr
v s lo p
so
y
ne a m w
s
no v n y -t h re lo
w of ic &
bo l -p lo l - re
f er a s g y f
e /p ce
r th
n
ica i ca e n -e be ves s ou e
th h /
e et ce iti gh
i en a dd hi
Themes en no
onv
c

Latency plots Social Media and Search Data


• Themes appearing first in sales data • Themes appearing first in search data
then in search data then in sales data

sales-search search-sales
350 250
209
293 200 186 187 180
300 168
154 158
150
250
214 99
100 71
200 50
166 166 50 40
14 15 26
9 2 4
150

Days
131
0
c e n ry ar e ss re n er s el l
la rie na ch ein
100 78 77 e cifi fre aria ber sug isqu tne pa icke urg tive ker ha alo tu pea rot
p o t e b r re ch b rva ac c
46 47 t s gm ege blu low ch /ale o-p m e m w of
p
50 - no v
ren rgy sy-t f ha res lo ce
20 l f e a e /p ur
9 ica en - e be ves so
h /
0 et ce iti gh
lt en dd hi
ds al
th ng tic ab on sa rb try oti
c
um i a
oo gi io cr m ca ul di en no
y f h e
ck
a e b
s al ea w po ex so nv
so ne pa pr s lo
ic
& w co
bo l - hn lo
ca et Themes
thi
e

Latency plots Search Media and Sales Data


• Client A has sales data for 19 themes.
• During data preparation stage only themes specific to our client were
considered.
• Total posts and search volume data were appended with the sales data of each
theme for particular year and month
• Few new columns were created by using the existing data which are likely to
help during model testing phase
• Few columns were also dropped that were not helpful for model testing

Sales model building guidelines


Dataset after adding total posts and search volume in sales data set
Dataset after creating dependent variable by aggregating sales of
client
• For modelling we have used multiple linear regression using backward
elimination approach
• Model building was done in 2 ways
– First we created separate models for each of the client themes and for each
model found out the significant columns
– Using eda and the insights from separate models certain columns were
dropped before obtaining dataset for final model

Modelling technique
low carb Salmon

Blueberry Soy Foods

Insights from models for each theme


• Features having high correlation with sales units value

• Features having high correlation between themselves


 units other vendors
 lbs value others

Insights from EDA


Final model results
• Blueberry
• Ethnic exotic
• Low carb
• No additives/preservatives
• Salmon
• Soy foods

Themes with high business values


• Beef Hamburger – price_per_lbs_A, units_per_lbs_A
• Blueberry- per_unit_price_A, units_per_lbs_A
• Chicken- price_per_lbs_A , units_per_lbs_A, per_unit_price_A
• Crab- price_per_lbs_A
• French bisque- per_unit_price_A
• Gmo_free- per_unit_price_A
• High Protein- per_unit_price_A, units_per_lbs_A

Controllable Factors for client across themes


Effect of per unit price on sales units value
Percent increase in sales for Vegetarian
Thank You

You might also like