Group 13 - Term Project

MSL - 719 STATISTICS FOR MANAGEMENT
Report by:
Group - 13, Section B
Names:
Kasturi Vineel Chandra (2021SMT6668)
Noonsavath Sandeep (2021SMF6542)
Monika Mardi (2021SMF6516)
Kirish Jhamtani (2021SMF6609)
Department of Management Studies, IIT Delhi

Prof. Seema Sharma
11 November 2021
On Demand Case
Introduction:
Ajanta Supermarket, our client, is a large-sized retail brand that has its offline stores operating
in the three major cities of Myanmar. It offers various products falling under the following
categories:
1. Electronic Accessories
2. Home and Lifestyle
3. Health and Beauty
4. Fashion Accessories
5. Food and Beverages
6. Sports and Travel
The client wants us to help them with the following broad business objectives:
1. Understand customer purchasing behavior

2. Ways to improve customer satisfaction
Problem-Solving Process:
Problem Understanding -> Data Collection -> Data Understanding -> Data Cleaning ->
Exploratory Data Analysis -> Hypothesis Testing -> Regression Analysis -> Insights &
Recommendations
Exploratory Data Analysis:
Consumer behavior
The sales are almost equal in all the product lines which states that the company is not stocking
any product which has less probability to get sold.
Customers bought 10 units at a time, which is most frequently seen when compared with the
other number of units bought in a single purchase. From this, it is evident that people
psychologically tend to buy 10 products at a time. The company can use this insight to attract
people who are buying 8 or 9 products at a time to make them buy 10 by offering a little
discount.
The average rating for food and beverages stands highest with 7.113, From the graph , it can be
inferred that all the accessories and products which falls under personal usage segment has
good rating except Home and life style which has 6.837 as average rating. The company can
perform better in home and lifestyle products which could help them to position themselves as a
brand which sells good products for personal usage.
Branch wise analysis
All their branches in three different cities sold same number of goods which means that all the
branches are equally performing which is a healthy sign for the company .
All the branches have sold almost equal number of products in different product lines except in
Nypitaw branch as there are least number of sales in the sports & travel, Home & lifestyle
products. Though sports & travel, Home and life style products in Naypyitaw branch accounts
for least number of sales, the average rating of those customers are above par.
The company has successfully turned almost 50% of its customers into members and company
can turn more into its members to make loyal customers.
Pearson’s Correlation Analysis:
Relationship between continuous variables, taken two at a time, can be determined by using
Correlation Analysis. The relationship can vary anywhere between Strongly Positive and
Strongly Negative.
In our case, the variables under consideration are - Unit Price, Quantity, Tax, Total Price,
Price Before Tax, Gross Income and Rating.
Subsequently, if we visualize the above results using a heatmap, the output will look something
like below:
Observations:
1. Rating has no relationship with every other continuous variable in the data
2. The strongest correlation exists between Quantity and Tax, Total Price, Price Before
Tax and Gross Income
3. The second highest correlation coefficient is between Unit Price and Tax, Total Price,
Price before Tax and Gross Income
Hypothesis Testing:
Hypothesis Statements:
Null Hypothesis (H0): Gender and Product Line are independent on each other
Alternate Hypothesis (H1): Gender and Product Line are dependent on each other
Null Hypothesis (H0): Customer Type and Product Line are independent on each other
Alternate Hypothesis (H1): Customer Type and Product Line are dependent on each other
Null Hypothesis (H0): City and Product Line are independent on each other
Alternate Hypothesis (H1): City and Product Line are dependent on each other
Null Hypothesis (H0): City and Payment Mode are independent on each other
Alternate Hypothesis (H1): City and Payment Mode are dependent on each other
Null Hypothesis (H0): Gender and Payment Mode are independent on each other
Alternate Hypothesis (H1): Gender and Payment Mode are dependent on each other
Chi-square Test:
Variables X-squared df p-value
Gender and Product Line 5.7445 5 0.3319
Customer Type and Product Line 3.325 5 0.65
City and Product Line 11.559 10 0.3156
City and Payment Mode 3.2997 4 0.509
Gender and Payment Mode 2.9497 2 0.2288
Observations:
1. P-values in all the five cases are greater than 0.05 (significance level)
2. Calculated X-squared values in all the five cases are lesser than critical values
3. Thus, we cannot reject the null hypothesis in all the five cases
Linear Regression in One Variable:
To find the least square regression line between Gross income (dependent) and unit price
(independent) separately for:
1. Electronic accessories
2. Food and beverages
3. Sports and travel
4. Home and lifestyle
Gross Estimated Slope Intercept Multiple R Significance
income- Unit Regression line R Square F
price
Electronic y = 0.2589x + 0.26 0.3329 0.59 0.3557 6.90E-07

accessories 0.3329
Food and y = 0.2222x + 0.22 1.87 0.57 0.33 2.29E-06

beverages 1.8732
Sports and y = 0.2837x - 0.28 -0.13 0.68 0.46 4.38E-09
travel 0.1392
Home and y = 0.3278x - 0.32 -1.88 0.68 0.47 2.68E-10

lifestyle 1.8811
Generic Case
Introduction:
We have collected the transaction data of retail industry in Myanmar. We are particularly
interested in the Brick-and-Mortar stores. There are four companies operating in this space and
due to security reasons, we can’t disclose the company names. However, we have collected the
data related to customer spends, annual income, ratings, time of order etc. Our intent is to deep
dive into the data and find meaningful insights.
Exploratory Data Analysis:
Customer type Average of Rating
Normal 4.027102804
Member 3.98172043
Members might not be getting the value for their membership money, thus the rating is low.
Customer type Count of Customer type

Member 93
Normal 107
Grand Total 200
Customer type Average of Spending Score (1-100)
Member 52.20430108
Normal 48.45794393
The industry has managed to bring a good chunk of customers into their membership program.
Moreover, we see that a member’s average spending score is higher compared to a normal
customer. This shows that we can expect revenue growth by converting normal customers into
members.
Here, we see that most customers shop between 5 to 9 PM and least customers shop between
7 to 9 AM. So, the staff can be employed in shifts, keeping the number of orders in mind.
Payment Average of Rating
Credit card 4.056451613
Cash 4.038461538
Ewallet 3.934246575
Payments through Ewallet seems to have some trouble associated and that can be inferred
from the average rating associated with it. A better Ewallet transaction system could improve
the customer satisfaction significantly.
The graph shows that the spending score is high between 11 AM to 3 PM and is lowest
between 9 to 11 AM.
Company Member Normal Grand Total
1 23 22 45
2 32 27 59
3 24 29 53
4 14 29 43
Grand Total 93 107 200

Company 2 leader in the market share and company 3 is the follower. Company 1 and 4 need
to focus on improving their market share.
We also observe that company 2 leads the pack in members/normal ratio and company 4 needs
to put more efforts to get better in the same as the ratio plays a significant part in revenue
composition.
Company Average of Rating
1 4.053333333
2 4.047457627
4 3.979069767
3 3.941509434
There is a significant gap observed between the ratings received by company 3 and 4 with
company 3.
One Way ANOVA Test
1. Between Ratings and Payment Mode
Null Hypothesis (H0): There is no difference in means of ratings between payment modes
Alternate Hypothesis (H1): There is difference in means of ratings between payment modes
We get the following results when we run one-way ANOVA on Rstudio:
Observations:
1. P-value is 0.411 (> 0.05), thus, the null hypothesis cannot be rejected
2. Thus, we do not have sufficient evidence to state that there is statistically significant
difference between payment modes
2. Between Spending Score and Payment Mode
Null Hypothesis (H0): There is no difference in means of spending scores between payment
modes
Alternate Hypothesis (H1): There is difference in means of spending scores between
payment modes
We get the following results when we run one-way ANOVA on Rstudio:
Observations:
1. P-value is 0.886 (> 0.05), thus, the null hypothesis cannot be rejected
2. Thus, we do not have sufficient evidence to state that there is statistically significant
difference between payment modes
Tools Used:
1. Microsoft Excel
2. RStudio
References:
1. Kaggle
2. Sthda
3. Towards Data Science
4. Statistics Globe
5. Guru99
6. Tutorials Point
Sample Data Used for ‘On Demand’ case:
Price
Customer Product Unit Quan Total before gross margin gross
Branch City type Gender line price tity Tax 5% Price Payment tax percentage income Rating
Health
and
A Yangon Member Female beauty 74.69 7 26.141 548.97 Ewallet 522.83 4.761904762 26.1415 9.1
Electroni
c
Naypyit accessor
C aw Normal Female ies 15.28 5 3.82 80.22 Cash 76.4 4.761904762 3.82 9.6
Home
and Credit
A Yangon Normal Male lifestyle 46.33 7 16.215 340.52 card 324.31 4.761904762 16.2155 7.4
Health
and
A Yangon Member Male beauty 58.22 8 23.288 489.04 Ewallet 465.76 4.761904762 23.288 8.4
Sports
and
A Yangon Normal Male travel 86.31 7 30.208 634.37 Ewallet 604.17 4.761904762 30.2085 5.3
Electroni
c
Naypyit accessor
C aw Normal Male ies 85.39 7 29.886 627.61 Ewallet 597.73 4.761904762 29.8865 4.1
Electroni
c
accessor
A Yangon Member Female ies 68.84 6 20.652 433.69 Ewallet 413.04 4.761904762 20.652 5.8
Sample of the data used for ‘Generic’ case:
Transaction Gender Age Annual Spending Score Company Rating Custom Payment Time of
ID Income (k$) (1-100) er type Order
750-67-8428 Male 19 15 39 2 4.5 Member Ewallet 11-1 PM
226-31-3081 Male 21 15 81 2 5 Normal Cash 5-7 PM
631-41-3108 Female 20 16 6 4 4.8 Normal Credit 9-11 PM

card
123-19-1176 Female 23 16 77 2 3.2 Member Ewallet 7-9 AM
373-73-7910 Female 31 17 40 4 3 Normal Ewallet 9-11 PM
699-14-3026 Female 22 17 76 1 3.8 Normal Ewallet 1-3 PM
355-53-5943 Female 35 18 6 2 3.4 Member Ewallet 3-5 PM
665-32-9167 Male 64 19 3 1 3.4 Member Credit 5-7 PM

card
692-92-5582 Female 30 19 72 1 5 Member Credit 7-9 PM

card
529-56-3974 Female 35 19 99 2 3.3 Member Cash 7-9 PM
365-64-0515 Female 58 20 15 4 4.4 Normal Ewallet 9-11 AM
829-34-3910 Male 37 20 13 2 4.3 Normal Cash 9-11 AM
299-46-1805 Male 22 20 79 3 3.4 Member Cash 11-1 PM
656-95-9349 Female 35 21 35 2 3.9 Member Credit 5-7 PM

card
765-26-6951 Male 20 21 66 4 4.5 Normal Credit 7-9 PM

card
329-62-1586 Male 52 23 29 4 3 Normal Credit 9-11 PM

card

371-85-5789 Male 25 24 73 3 3 Normal Ewallet 5-7 PM
273-16-6619 Female 46 25 5 1 3.9 Normal Credit 5-7 PM

card
636-48-8204 Male 31 25 73 3 5 Normal Ewallet 3-5 PM

Group 13 - Term Project

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Group 13 - Term Project

Uploaded by

Copyright:

Available Formats

MSL - 719 STATISTICS FOR MANAGEMENT

Department of Management Studies, IIT Delhi

1. Understand customer purchasing behavior

Exploratory Data Analysis:

Variables X-squared df p-value

Gender and Product Line 5.7445 5 0.3319

Customer Type and Product Line 3.325 5 0.65

City and Product Line 11.559 10 0.3156

City and Payment Mode 3.2997 4 0.509

Gender and Payment Mode 2.9497 2 0.2288

Linear Regression in One Variable:

Electronic y = 0.2589x + 0.26 0.3329 0.59 0.3557 6.90E-07

Food and y = 0.2222x + 0.22 1.87 0.57 0.33 2.29E-06

Home and y = 0.3278x - 0.32 -1.88 0.68 0.47 2.68E-10

Exploratory Data Analysis:

Customer type Average of Rating

Customer type Count of Customer type

Customer type Average of Spending Score (1-100)

Payment Average of Rating

Credit card 4.056451613

Company Member Normal Grand Total

Grand Total 93 107 200

Company Average of Rating

1. Between Ratings and Payment Mode

We get the following results when we run one-way ANOVA on Rstudio:

We get the following results when we run one-way ANOVA on Rstudio:

Sample Data Used for ‘On Demand’ case:

Sample of the data used for ‘Generic’ case:

750-67-8428 Male 19 15 39 2 4.5 Member Ewallet 11-1 PM

226-31-3081 Male 21 15 81 2 5 Normal Cash 5-7 PM

631-41-3108 Female 20 16 6 4 4.8 Normal Credit 9-11 PM

373-73-7910 Female 31 17 40 4 3 Normal Ewallet 9-11 PM

699-14-3026 Female 22 17 76 1 3.8 Normal Ewallet 1-3 PM

355-53-5943 Female 35 18 6 2 3.4 Member Ewallet 3-5 PM

315-22-5665 Female 23 18 94 4 4.4 Normal Ewallet 11-1 PM

665-32-9167 Male 64 19 3 1 3.4 Member Credit 5-7 PM

692-92-5582 Female 30 19 72 1 5 Member Credit 7-9 PM

351-62-0822 Male 67 19 14 3 3.2 Member Ewallet 5-7 PM

529-56-3974 Female 35 19 99 2 3.3 Member Cash 7-9 PM

365-64-0515 Female 58 20 15 4 4.4 Normal Ewallet 9-11 AM

252-56-2699 Female 24 20 77 3 3.1 Normal Ewallet 11-1 PM

829-34-3910 Male 37 20 13 2 4.3 Normal Cash 9-11 AM

299-46-1805 Male 22 20 79 3 3.4 Member Cash 11-1 PM

656-95-9349 Female 35 21 35 2 3.9 Member Credit 5-7 PM

765-26-6951 Male 20 21 66 4 4.5 Normal Credit 7-9 PM

329-62-1586 Male 52 23 29 4 3 Normal Credit 9-11 PM

319-50-3348 Female 35 23 98 1 4.6 Normal Ewallet 3-5 PM

300-71-4605 Male 35 24 35 3 4.4 Member Ewallet 9-11 PM

273-16-6619 Female 46 25 5 1 3.9 Normal Credit 5-7 PM

636-48-8204 Male 31 25 73 3 5 Normal Ewallet 3-5 PM

You might also like