You are on page 1of 11

Data Analysis for Managers CIA– 2

Data Analysis Exercise

Under the guidance of


Dr. Joseph Durai Selvam

Submitted By:
Prajjwal Bajpai
2228241
MBA-O
Table of Contents
1.Introduction ................................................................................................................................................... 3
2.Methods .......................................................................................................................................................... 3
3.Steps and Interpretation with Implication ................................................................................................. 4
3.1 Correlation .............................................................................................................................................. 4
3.2 Regression ............................................................................................................................................... 5
3.3 Paired T test ............................................................................................................................................ 7
3.4 Anova ....................................................................................................................................................... 8
3.5 Chi Square .............................................................................................................................................. 9
4. References ................................................................................................................................................... 11
1.Introduction
The first data was taken from https://www.kaggle.com/datasets/colearninglounge/employee-attrition

The data contains information about a company’s employee attrition and contains information about the
employee in the company. It contains information like monthly income, department, performance, years at
work etc.

And the second dataset has been taken from https://www.kaggle.com/datasets/parvezkhan90/cosmetic-


products-sales

This second data set contains information about cosmetic product net sales, quantity, price, location, MRP,
pack size, pack unit, rank etc. about cosmetic product.

2.Methods
The methods I have followed are-

1. Correlation for cosmetic product - quantity and sales


2. Regression for cosmetic product - quantity and sales
3. Paired T test for cosmetic product for Ahmedabad net sales and total net sales of all the regions
4. Anova for employee attrition for total working years, years at company, years in current role
5. Chi Square for employee attrition for gender and departments
3.Steps and Interpretation with Implication
3.1 Correlation
Net Sales calculated Qty
Net Sales calculated 1
Qty 0.998262817 1

From this we can find out that Quantity and net sales is positively related to each other and as the correlation
is 0.99 this means that they are almost perfectly related to each other as the value is very close to 1.

Interpretation With Implication

As we can see the correlation is 0.99 between net sales and quantity this means that they are both positively
correlated and they are almost perfectly related as the value is very close to 1. This means that as one variable
is increases the other will also increase. In this case as the quantity increases the net sales will also increase.

Business Implication: This means that if we want to increase our net sales then we need to increase our
quantity. So if want to increase our sales we can follow this strategy to increase our sales. And if ever our
sales are getting low or affected then we can easily figure out that the quantity is the problem.
3.2 Regression
Step 1- Hypothesis

H0: β1= 0

Hα: β1≠ 0

Step 2- Level of Significance

α = 0.05

Step 3- Test statistic

Regression Statistics
Multiple R 0.998262817
R Square 0.996528652
Adjusted R Square
0.996505037
Standard Error2784.118951
Observations 149

ANOVA
df SS MS F Significance F
Regression 1 3.27103E+11 3.27E+11 42199.66093 1.1089E-182
Residual 147 1139443795 7751318
Total 148 3.28242E+11

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 1119.280975 310.70998 3.602334 0.00043078 505.2455631 1733.316387 505.2455631 1733.316387
Qty 41.72332856 0.203106802 205.4256 1.1089E-182 41.32194214 42.12471499 41.32194214 42.12471499

Step 4- Rejection Rule

We will reject the null hypothesis as P<α. P=0.00001 and α=0.05 so clearly 0.00001<0.05 so we will reject
the null hypothesis. So β1≠ 0.

Interpretation and Implication

Our R2= 0.99 which means that 99% of variation in Net sales is due to quantity. And also through the Anova
table we can see that significance is <0.05 as our significance is ~0.00001. So our model is good.

y=β0+β1*x1

β0= 1119.28 β1=41.72 and let x1 be 1500

y=1119.28+41.72*1500

y=63699.28

What this means is that 1 unit increase in quantity will cause our net sales to increase by 41.72 times. So in
this case we get to know that if we sell 1500 quantity the net sales that we will get will be 63699.
Business Implication: Regression will help us predicting the future sales and help us in setting goals for the
amount of quantity we need to produce to reach those sales goals. So in this case we can see that if see set a
goal of 1500 quantity then the net sales we will get from that is 63699.28 the same can be done vice versa
meaning that we can keep sales goal as 63699.28 and find out that we need 1500 quantity to achieve
3.3 Paired T test
Step 1- Hypothesis

H0: µ1=µ2 This means that there is no difference between the average net sales in Ahmedabad and the total
net sales.

Hα: µ1≠µ2 This means that there is a difference between the average net sales in Ahmedabad and the total net
sales.

Step 2- Level of Significance

α=0.05

Step 3- Test Statistics

Ahmedabad Net Sales calculated Total Net Sales calculated


Mean 77199.55882 44462.81879
Variance 2091109922 2217854386
Observations 34 149
Hypothesized Mean Difference 0
df 50
t Stat 3.745608656
P(T<=t) one-tail 0.000233176
t Critical one-tail 1.675905025
P(T<=t) two-tail 0.000466351
t Critical two-tail 2.008559112

Step 4- Rejection Rule

We will reject the null hypothesis as the P<α. Our P for the Two Tail test is = 0.0004 and α=0.05 as
0.0004<0.05 we will reject the null hypothesis. So µ1≠µ2.

Interpretation and Implication

As we have rejected the null hypothesis then it means that µ1≠µ2. And that means there is a difference between
the average net sales in Ahmedabad and the average of total net sales. This means that the average of net sales
in Ahmedabad is outperforming the average of total net sales of all the regions as we can see from the means
of both that are 77199 and 44462 showing that the performance in Ahmedabad is great.

Business Implication: The business implication is that as we can see average net sales in Ahmedabad is 77199
and the average of all regions is 44462. Showing that the performance in Ahmedabad is much better so we
will send provide more quantity to Ahmedabad as we will get more return on investment from their. And we
will also try to get the performance of other regions up so the average total net sales can also be increased.
3.4 Anova
Step 1- Hypothesis

H0: mean of total working years=mean of years at company=mean of years in current role i.e µ1=µ2=µ will
be the null hypothesis. Showing that there is no difference between Total working years, Years at company
and years in current role meaning that there is no relationship between them nor do they affect each other.

Ha: At least one mean is different showing that there is relationship between Total working years, Years at
company and years in current role and they do affect each other.

Step 2- Level of Significance

α=0.05

Step 3- Test Statistics

Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
TotalWorkingYears 303 3799 12.5379538 62.75931633
YearsAtCompany 303 2268 7.485148515 42.80689791
YearsInCurrentRole 303 1320 4.356435644 14.14405613

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 10327.94939 2 5163.974697 129.4118211 3.65775E-50 3.01
Within Groups 36152.50165 906 39.90342345

Total 46480.45105 908

Step 4- Rejection Rule

The null hypothesis will be rejected as P<α. As our P = 0.00004 and α=0.05 as 0.00004<0.05 so we will
reject the null hypothesis. This tells us that there is a difference in mean between the three variables showing
a relationship between the three variables.

Interpretation and Implication

As we have rejected the null hypothesis that means that we will accept the alternative hypothesis that is Ha:
At least one mean is different showing that there is relationship between Total working years, Years at
company and years in current role and they do affect each other. This means that all three of them will
influence each other in some way.

Business Implication: As there is a link between total working years, years at company and years in current
role and they all affect each other then we can use them to come up with promotional measures for the
employees and we can also use this information to motivate our employees to work better as It will make our
promotional policy transparent.
3.5 Chi Square
Step 1- Hypothesis

H0: The row variable is independent of the column variable that means that the employees those who are
working in different departments is not based on their gender.

Hα: The row variable is dependent of the column this means that the employees those who are working in
different departments is based on their gender.

Step 2- Level of significance

α = 0.05

Step 3 Test statistics

Count of gender
Row Labels Female Male Grand Total
Human Resources 8 8 16
Research & Development 81 120 201
Sales 38 48 86
Grand Total 127 176 303

fij eij fij-eij (fij-eij)^2 (fij-eij)^2/eij


Observed expected
8 6.706270627 1.29372937 1.67373569 0.249577714
8 9.293729373 -1.2937294 1.67373569 0.180093009
81 84.24752475 -3.2475248 10.546417 0.125183702
120 116.7524752 3.24752475 10.546417 0.090331421
38 36.04620462 1.95379538 3.81731639 0.105900647
48 49.95379538 -1.9537954 3.81731639 0.076416944
0.827503437

χ2 = ∑∑(fij-eij)^2/eij

=0.827503437

(r-1)(c-1)=> 2*1 = 2

χα2 = 5.991

Step 4- Rejection rule

As χ2 < χα2 we will not reject the Null hypothesis. As our χ2=0.827503437 and χα2 = 5.991 and
0.827503437<5.991 so we will not reject the null hypothesis.

This means that the row variables and the column variables are independent of each other.
Interpretation and Implication

As we will not reject the null hypothesis that means the row variable is independent of the column variable
that means that the employees those who are working in different departments are not based on their gender.
What that means is that there is no relation between certain genders only working in specific departments like
for example there is no relation between the HR department the gender of employee getting placed in that
department.

Business Implication: This is very useful information for us as it tells us that there is equality being followed
in each department and no department is hiring employees based on their gender and rather their qualifications
and other variable factors. As departments and gender are independent of each other it will help us in
promoting diversity and equality in the workplace and in turn increase our productivity.
4. References
• https://www.kaggle.com/datasets/colearninglounge/employee-attrition

• https://www.kaggle.com/datasets/parvezkhan90/cosmetic-products-sales

You might also like