IB9JB0 Marketing and R Analytics Assignment

Marketing and Strategy Analytics (IB9JB0)
Student ID: 5587045

Word counts: 2998
TABLE OF CONTENT
1. QUESTION 1
INTRODUCTION ABOUT THE OVERALL STRUCTURE AND METHODOLOGY
A. Examining the marketing the approach generate more profit for growth
B. Ranking the platforms for future Voucher-Inclusive Approach based on the profit
from the first order of the new customers)
C. Establishing the comparison among the profit of Voucher-Inclusive Approach and
Standard Strategies Approach
D. Recommendations about the variables that contributes to the future data gathering and
processing of TastyBites
2. QUESTION 2
A. Examining the marketing the approach generate more profit for growth
B. Ranking the platforms for future Voucher-Inclusive Approach based on the profit from
the first order of the new customers)
C. Establishing the comparison among the profit of Voucher-Inclusive Approach and
Standard Strategies Approach
3. QUESTION 3
A. Identify (and target) different customer segments into meaningful clusters with proper
marketing strategy
B. Performing separate clustering for males and others
Question 1:
There is a total of 5 steps need to be clearly identified as follow: “Step 1: Preparing and
Loading the dataset'' named “TastyBites_A.csv” as well as switching the data’s variables
including ‘device’; ‘target’ and ‘platform’ to factors ; “Step 2: Data Inspection and Cleaning”
by checking to eliminate any n/A; examining the the data types, identifying the missing values,
and explore for outliers ; “Step 3: Developing the Linear model” to examine the combinations
and interdependencies of various variables to generate key insights. The next step is loading the
dataset ‘TastyBites_B.csv’ for part C to calculate the long-term profit margin (LTP). The final
step is “Step 5: Evaluating the model” which quantitatively evaluates the analysis and
provides insights for future marketing investment and strategic decision-making.
A. Do consumers in the Voucher-Inclusive Advertising approach generate more profit

(compared to the Standard Advertising approach)?
Step 1: Importing the dataset “TastyBites_A.csv” and preparing for data exploration as well as
complete data inspection and Cleaning.
Step 2: Developing the simple linear model to compare the profit of the Voucher-Inclusive tactic
and the Standard Advertising approach (‘basket_value’ = ‘variable’ and the ‘target’ = predictor).
Next, print and output the summary of the model to evaluate the “coefficients”; “statistical
significance” of “p-value”; “the R-square value”
First and foremost, when it comes to analyzing the coefficient associated with the 'data$target' in
a linear regression framework, there exists a positive coefficient value that indicates a positive
relation between the Voucher-Inclusive Advertising Marketing Strategy and a gradual
increase in basket value - which shows that the consumers in the Voucher-Inclusive Advertising
tactics approach do generate more profit (compared to the Standard Advertising approach). The
findings of this study are consistent with marketing theories that suggest providing economic
incentives increases consumer participation (Kotler & Keller, 2016). On the other hand, the
result has been indicated by a low p-value of 0.04698368 (which is still below the standard
alpha level of 0.05) —the small magnitude of this coefficient, 2.759, means that the effect
demonstrates small significance. This closely links to the research of Lehmann, Gupta, &
Steckel (1998) which demonstrates it is crucial to do a comprehensive analysis of the economic
viability of the voucher programme by carefully evaluating the incremental profit increase in
relation to the expenses associated with implementing the voucher. In addition to this, the
adjusted R-Squared statistic-–0.003395. The residual standard error—23.62 indicates that
only 34% of the variability in basket value is explained by the 'target' variable, meaning that
there are other factors that establish an influence on the basket value which are not
included in the model. A limited ability to explain makes many experts believe the model may
not be the best predictor (Hair, Black, Babin, & Anderson, 2014, p. 107). Furthermore, the
residual standard error offers insight into the extent to which observed values deviate from the
regression line. A high portion of familial estimation can also indicate an unreliable estimate,
which may reflect some sort of model misspecification, or omission of the necessary factors
(Heiberger & Holland, 2004).
In addition, it is significant to make a comparison of the variable “basket_value” with a scatter

plot visualization representing the average profit which investigates the difference between the
customers who engaged in the Voucher-Inclusive advertising strategies in different platforms
where “target=1” and the Standard Advertising “target=0” - with the assumptions that customers
who engaged with voucher will generate more profit in comparison with the another in all
platforms. Those R codes below will investigate the influence of “voucher usage” on the
customer expenditure among different platforms The steps can be summarized as follow: “Step
1: Grouping the data by the variable “platform” and “target” that analyze the specific
platform that generates the specific outcome to the future marketing tactics - the “basket_value
stands for the profit and the “error bar” stands for the statistical uncertainty while developing the
visualization; “Step 2: Adding the new column called “mean_se$platform_readable” to
change “platform” into factors, which will contribute to the visualization process; “Step 3:
Visualization of the Scatter Plot”
The analysis will be done using a hypothesis test for each platform, comparing the average
basket value of groups that used vouchers (‘target’ = 1) to the ones that did not (‘target’ = 0).
Figure 1: Mean Basket Value by Platform and Voucher Use
The red points representing the voucher-included strategy are all upon the blue points
representing the none-voucher included for most platforms. In the other words, the mean basket
value across all platforms clearly demonstrates that the inclusion of Vouchers strategy has a
significant positive impact on the amount spent by customers in comparison with the Standard
Advertising tactics. The basket values in Figure 1 are higher on average for platforms where
Voucher-Inclusive Advertising was utilized, this indicates that the strategy has significantly high
potential in increasing the profit. In further evaluation to each platform, the “basket_value”
especially reached the highest in platform “Google” and “X”. This outcome also illustrates that
the customer base and customer engagement from “Google” and “X” is the most prospective to
develop further proper marketing strategies.
B. How should platforms be ranked for future Voucher-Inclusive Advertising approach

(based on the profit from the first order of the new customers)?
The first step “Step 1: Data Segmentation - Filtering for the Voucher-Inclusive tactics” will be
filtering the right approach and clearly splitting the subset of data pertinent of Voucher-Inclusive
with the “target”=1 and check if the ‘platform’ has been established as a ‘factor’ or no. The next
step is “Step 2: Changing the characteristics of variables” checking if the “platform” has been
transferred into the ‘factor variable instead of the numeric one or not, in order to eliminate the
potential errors. “Step 3: Developing the multiple regression model using lm function”, with
the 'basket_value' as the dependent variable and 'platform' stands for the independent variable in
order to relate the different levels in the platform category and the variation in the average basket
value. The summary function was used to provide an in-depth view of the model, which included
the coefficients, their statistical significance, and the fit metrics as a whole, such as the
R-squared value and the F-statistic value. The “predicted basket values varied by platform”
were established to calculate the estimated values for the mean of customers on each platform
("x’ factor” will represent “platforms” and "y" factor will represent the “basket values"). The
final step is to create a data visualization representing the output showing the relationship of the
predicted basket value by platforms.
Figure 2: The Predicted Basket Value by Platform
Figure 3: Fitting the Linear Regression Outcome
The F-statistic is ideally stand by 2833 with and a p-value < 2.2e-16 has established significant
correlation among the dependent and independent variables for the prediction outcome.
However, the low R-Square of 0.03029 means that there should be a more complex linear model
to generate more accurate results.
In order to investigate the correlation among the ranking platform with From the upon output, the
'deviceiOS' has shown the positive correlation by the coefficient of 8.90 investigating that using
an iOS device meaning that there is an increase in the profit which is represented by the basket
value by nearly $8.909 in comparison with the other devices. Along with the ‘iOS’ device,
‘time_engagement’ demonstrates the coefficient of 0.068 which is positive and establishes that
holding all other variables constant for a one unit raising in time engagement in which there is
approximately a $0.086 increase in basket value.
C. Did the Voucher-Inclusive Advertising approach generate more long-term profit

than the Standard Advertising approach?
The first step is loading and exploring the dataset “tastybites_b” and summarizing the average of
LTP for each group. The next step is to create a t-test to establish the difference in LTP between
the two groups and the long-term profitability of the 2 targets.
The result shows that there is statistically a significant difference in LTP between the two groups,
the group who receive Voucher-Inclusive Advertising has higher mean LTP than the group who
does not receive Voucher-Inclusive Advertising. The group receiving Voucher-Inclusive
Advertising (target=1) has average LTP about 166.47, which will generate higher profit in the
long term than the Standard Advertising (target=0) (about 163.71). This result has also been
supported by Reinartz & Kumar (2003) which demonstrates that repeat purchase behavior (LTP)
has strong positive correlation with multiple promotional tactics and voucher-inclusive. Liu and
Yang (2009) also investigate the increase in post-purchase behavior, this will contribute to the
long-term profit margins. Besides, when it comes to the model evaluation, the t-statistics test of
-3.6163 (between) -6.3633 and -3.616254) shows the negative results of showing that there
exists a huge statistical difference among those 2 targets.
A jitter plot will be presented to visualize the relationship between the 2 types of advertising
approaches and its correlation with the long term profit:
As it could be seen from the data above, The plot indicates the mean LTP of the
Voucher-Inclusive advertising group demonstrates more intensive and higher clustering points
(which represents the prospectively higher LTP mean) in comparison with the Standard
Advertising Approach. This result indicates that customers engaged in the Voucher-Inclusive
Strategy generate more profit margin for Tasty Bites in comparison to the Standard Advertising
Approach.
Running a Linear graph to Testing the accuracy of the model:
The low adjusted r-squared of 0.022 (a good value should be between 0.6-0.7 according to Harel
(2009) indicates that the added variables deliver low explanatory power and have insufficient
ability to generate predictive correlation with the LTP.
D. If you had the opportunity to collect more data, what would be the three most
important variables/information you would want to know from the customers?
First and foremost, identifying the factor of “Customer Lifetime Value” is an effective tool for
TastyBites to develop appropriate CLV-based marketing strategies that approach and target the
right customers (Kumar and Rajan, 2020). It was significantly important for TastyBites to
understand how much they could generate the profit margins from a customer throughout their
entire relationship to decide if any of the specific segmentation of customer is significant to
launch proper advertising campaigns on. Identifying the characteristics of “Customer Lifetime
Value” will help TastyBites generate the long-term customer loyalty as well as the profit instead
of only achieving profit in the short-term growth after customers complete the first transaction.
Additionally, along with the dataset coverings “customer information” after the first-time
purchasing or “evaluation of alternatives” that TastyBites concentrates on identifying, the
“Post-Purchase behavior statistics” would be another significant piece of information that the
company needs to further address in order to achieve decent understanding about its customers’
needs and wants. This “Post-purchase statistics” information could be the repeated times of
buying the same product, or simply the customer satisfaction including their feedback and review
after experiencing the products or services of TastyBites. According to the “Marketing and Buyer
Behaviour” decision-making models provided by Hoyer (1984), establishing the characteristics
that motivates customers to continuously interact with the brand’s product will strengthen the
customer loyalty, leverage the product offerings and contribute to the development of the brand's
identity. By enhance the deeper insights into different stage of decision-making process of their
target segment, Tasty Bite not only could attract the new potential customers, but at the same
time regain the lost consumers - this stage is significant to enhance a more comprehensive
understanding about the overall effectiveness of the Voucher-Inclusive marketing approach after
the first transaction (Mugge, Schifferstein, and Schoormans, 2010).
Last but not least, TastyBites should develop a deeper insight to its segmentation by categorizing
the “Segmentation and Demographics” including gender, age, income, social classification and
so on to not only tailor directly into the key expectation of the specific customers’ group to
maximize the conversion rate as well as enhancing the variable “basket_value” representing the
profit margins of the brand (Johnson & Gupta, 2021). Taking the demographic of “age” as an
example, Gen Z might show higher engagement with the products across social platforms while
the Millennials Generations might get used to the Standard Advertising Approach which delivers
the traditional marketing practices (Munsch, 2021).
Question 2:
A. Find the correlation between ‘T_speed_UK’ and ‘price’ and interpret the results.
Find and interpret the correlation between ‘T_speed_UK’ and ‘speed_US’
B. Suggest a tree-based model that allows you to understand what EV features affect
the consumers’ willingness to pay (‘price’). Apply your suggested method (using R)
and explain the results.
The “Regression Tree Model” is believed to be the most suitable model to make predictions of
the EV features that affect consumer’s willingness to pay, since the data set have the capability to
deal with complex non-linear relationships and interactions across the predictor variables
without the action of transforming (Gomes, Amantes, and Jelihovschi, 2020).
The output belows represents the structure of a regression tree that was further developed with
the purpose of predicting the price of electric vehicles (EVs) relying on the important
characteristics that are believed to have significant impacts on the willing to pay of customers
including the the “number of seats”, “SUV categorization”, “acceleration”, warranty”, and
“speeds” in both of the US and UK markets.
The dataset comprises a total of 10,370 data points and the EV pricing from the model
investigates multiple characteristics that are believed to be most significant for vehicles valuation
and the customer’s willingness to pay. Additionally, the dataset's first price variability is
indicated by the “root node” variables, which has established an average EV price of $40,879
with an important determinant. Furthermore, in accordance with the 7,570 in node number 2 that
were being processed, along with the variable "seats" is believed to be the key contributor to the
customer’s desire to pay, there is a greater insight related to these findings. As it could be seen
from the dataset as well as, vehicles that have less than the capacity of 6 passengers are being
categorized at the mean of value of $26,230 and will generate a more customer interest in
comparison with the vehicles that have equal or more than 6 seats (which has the mean value of
$80,483). This could contribute useful insights to the typical market trend and the developing
customer expectations about the product development in which the electric vehicles (EVs) with
seating capacities that are limited to or lower than the indicated limit will have a more affordable
price. Along with the “seat”, the “speed” would also be an important variable that significantly
contributes to the customer’s willingness to pay. A conclusion can be drawn that the following
factors include “the number of seats in an electric car”; the vehicle’s speed” - which does
generate a strong correlation with the “price”; “the vehicle that is categorized as an SUV”; the
“warranty” and the “acceleration”.
C. Evaluate the model’s performance
As there was no splitting between training and testing data, accuracy measurements would be
calculated from the predicted values against the actual values from the same training dataset. The
performance metrics will be evaluated based on the calculation of Root Mean Squared Error
(RMSE) and Mean Absolute Error (MAE), common metrics used for measuring the prediction
accuracy in regression models, comparing with the calculated average price to gain useful insight
for the evaluation:
The RMSE is approximately $27,674 and the MAE is nearly $13,311, giving the total outcome a
picture of the size of the prediction error. Comparing with the price mean of $48.879, the relative
RMSE and MAE are also expressed as percentages of the EV price mean to establish that the
RMSE and MAE are both comparatively high, respectively 67.7% and 32.56% - and this implies
that there is potential for the model to enhance its forecast more precisely.
Regarding the Root Mean Square Error (RMSE) of $27,673: Compared to the mean price of
$48,879, the RMSE value of $27,673 takes up over half the mean value. This might suggest that
predictions output by the model are varying quite largely, and this might indicate presence of
significant errors in individual cases. However, provided that the data has a wide range, the
RMSE can be regarded as not too large, which might suggest that the errors are still within an
acceptable range.
Regarding the Mean Absolute Error (MAE) of $13,311: The MAE is approximately equal to
1/3 of the mean price. This amount of inaccuracies is possibly classified as high, however, the
errors are still within an acceptable range.
Question 3:
A. Identify (and target) different customer segments into meaningful clusters in which
individuals within a cluster are similar but different from those individuals in other
clusters. After clustering, discuss and provide a marketing strategy for each
resulting cluster to maximize the effectiveness of your marketing efforts
The first step will be importing and exploring the data: in this step, the “male” column was being
switched from the “yes”, “no" categorical to the numeric variable in order to develop the
K-Means clustering. After the stage of cleaning the data, it has been found that there were no
missing or null values in the dataset. The number of clusters chosen for the K-Means would be 3,
which means that the dataset will be splitted into a total of 3 clusters.
In this section, the data is clustered using K-means into 3 clusters. The data is visualized using 2
features, Average Time Spent (seconds) and Average Page Visits. Visualization shows that the 3
clusters are separated quite clearly, with the first cluster being customers with time spent over
450 seconds, the second cluster being customers with time spent between 350 and 450
seconds, and the last cluster being customers with time spent below 350 seconds. We can
observe that the clusters are not overlapping, and components in each cluster are not being in
close proximity to those of other clusters.
Cluster 1 can be categorized as the “active customer group” describing the group of customers
who spend more than 450 seconds browsing the website. According to Jones et al (2018),
generating ecommerce personalization content will be the best tactic to enhance their awareness
- thus initially increasing the number of product purchases. After witnessing the increase in
number of first transactions, the company could maximize the customer experience including
their utilitarian needs by delivering multiples of loyalty programs or seasonal rewards events to
strengthen the customer base and consistently remaining the number of purchases.
Cluster 2 describes the group of customers who spend time between 350-450 seconds
browsing the Internet, their interest in the range or active consideration in the purchasing cycle
(Davis & Palmer, 2017). The strategy for this segment includes providing promotional programs
or voucher-inclusive activities which show effectiveness in facilitating the customer decision
making process when combined with proper product information (Bilal, Ahmed, and Shehzad,
2014)
Cluster three describes the group of customers who spend less than 350 seconds browsing the
internet - this delivers the characteristics of “price aggressiveness” in which making a price
comparison among different alternatives might play a crucial role in the start of this segment’
decision making process of purchasing (Lopez & Lee, 2020). According to Reibstein (2002),
instant purchase offers visible on social commerce platforms would be the best way to capture
this’ segment attention thus increasing the net profit margin. To be more specific, the tactic of
Quick browsers or highlighting a limited number of recommended products or the clearance
items could fastly capture this segment's attention and highly motivate them to purchase the
product once (Robinson, 2021). However, this strategy could only boost the sales growth in
short-time - as the business should enhance multiple marketing tactics as well as leveraging the
product offerings to sustain the customer loyalty.
B.
The dataset has been split into 2 subsets, one for “males” and one for “others”, each of which
would be visualized after being clustered by K-means.
In the first scatterplot, i.e. the clustering results for “males” customers, it can be observed that the
clusters are not clearly separated, and there is large overlap between all clusters across the values
of Average Time Spent and Average Page Visits. In the second scatterplot, the same phenomenon
is more or less exhibited, although there is a stronger concentration of customers belonging in
cluster 2 around the 300 to 500 seconds range of Average Time Spent.
In theory, in order to compare the results across these subsets’ clustering results, observations on
each or both of the axes' values could imply different patterns of browsing behaviors, such as
focused or casual/exploratory browsing. Assuming that gender may have an influence on
shopping behaviors, we could expect the clustering results to exhibit such differences. However,
in this case, there seem to be no distinct cluster in either subset that shows significantly different
behavior, and the clustering results of both subsets are, to a degree, quite similar in the sense that
clusters are overlapping largely and distributed across the axes’ values, it could be safely
concluded that gender does not give a strong influence on variables included in the dataset.
Work citation
Boyer, K.K. and Pagell, M., 2000. Measurement issues in empirical research: improving
measures of operations strategy and advanced manufacturing technology. Journal of Operations
Management, 18(3), pp.361-374.
Bilal, G., Ahmed, M.A. and Shehzad, M.N., 2014. Role of social media and social networks in
consumer decision making: A case of the garment sector. International Journal of
Multidisciplinary Sciences and Engineering, 5(3), pp.1-9.
Chapman, C. and Feit, E.M., 2015. R for marketing research and analytics (Vol. 67). New York,
NY: Springer.
Harel, O., 2009. The estimation of R 2 and adjusted R 2 in incomplete data sets using multiple
imputation. Journal of Applied Statistics, 36(10), pp.1109-1118.
Hoyer, W.D., 1984. An examination of consumer decision making for a common repeat purchase
product. Journal of consumer research, 11(3), pp.822-829
Lantz, B., 2019. Machine learning with R: expert techniques for predictive modeling. Packt
publishing ltd.
Liu, Y., & Yang, R. (2009). Competing loyalty programs: Impact of market saturation, market
share, and category expandability. Journal of Marketing, 73(1), 93-108.
Mugge, R., Schifferstein, H.N. and Schoormans, J.P., 2010. Product attachment and satisfaction:
understanding consumers' post‐purchase behavior. Journal of consumer Marketing, 27(3),
pp.271-282.
Munsch, A., 2021. Millennial and generation Z digital marketing communication and advertising
effectiveness: A qualitative exploration. Journal of Global Scholars of Marketing Science, 31(1),
pp.10-29.
Kumar, V., & Shah, D. (2004). Building and sustaining profitable customer loyalty for the 21st
century. Journal of Retailing, 80(4), 317-329.
Kumar, V. and Rajan, B., 2020. Customer lifetime value: What, how, and why. In The Routledge
companion to strategic marketing (pp. 422-448). Routledge.
Reibstein, D.J., 2002. What attracts customers to online stores, and what keeps them coming
back?. Journal of the academy of Marketing Science, 30, pp.465-473

IB9JB0 Marketing and R Analytics Assignment

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

IB9JB0 Marketing and R Analytics Assignment

Uploaded by

Copyright:

Available Formats

Marketing and Strategy Analytics (IB9JB0)

Student ID: 5587045

INTRODUCTION ABOUT THE OVERALL STRUCTURE AND METHODOLOGY

INTRODUCTION ABOUT THE OVERALL STRUCTURE AND METHODOLOGY

A. Do consumers in the Voucher-Inclusive Advertising approach generate more profit

In addition, it is significant to make a comparison of the variable “basket_value” with a scatter

B. How should platforms be ranked for future Voucher-Inclusive Advertising approach

C. Did the Voucher-Inclusive Advertising approach generate more long-term profit

C. Evaluate the model’s performance

You might also like