You are on page 1of 16

DATA ANALYTICS FOR MANAGERS

CIA-1: DATA VISUALISATION AND PROBABILITY

SUBMITTED BY

RUTVIK SANTOSH PAKALE

REGISTER NUMBER

2227046

SUBMITTED TO

Dr. Ramanathan

CLASS

1-MBA-A

1|Page
TABLE OF CONTENT
Sr. No. Topic Page No.
1 Interval Estimation 3
2 T-test 5
3 Anova 8
4 Chi Square 10
5 Regression 13
6 Corelation 16

LAKME COSMETICS PRODUCTS

SOURCE: www.kaggle.com

2|Page
INTERVAL ESTIMATION
The most fundamental point and interval estimate method includes the estimation of a
population mean. Consider the case when calculating the population mean for a numerical
variable would be helpful. A simple random sample can be used to gather data to determine
the sample mean, or x, and its value is used as a point estimate.
When using the sample mean as a point estimate of the overall mean, some error can be
expected because a sample, or subset of the population, was used to compute the point
estimate. The absolute value of the difference between the population mean and the sample
mean, represented by the symbol |x|, is the sampling error. A 95% confidence interval
estimate for the population mean in the large-sample situation is given x̄ ± 1.96σ/Square root
of √n. The sample standard deviation is utilized in the confidence interval formula to estimate
when the population standard deviation, is unknown.

Net Sales calculated

Mean 52316.86
Standard Error 1516.64
Median 36643.00
Mode 12283.00
Standard Deviation 47960.22
Sample Variance 2300182636.09
Kurtosis 3.66
Skewness 1.77
Range 325170.00
Minimum 295.00
Maximum 325465.00
Sum 52316859.00
Count 1000.00

Upper Limit 55289.46


Lower Limit 49344.25

The above table depicts the statistical values for the interval estimation for the net sales data
taken.

INFERENCES AND CALCULATIONS

3|Page
μ=x¯±Zα2σn√
Upper Value = 55289.46

Lower Value = 49344.25

=52316.86 + (1.96*47960.22)

=55289.46

Lower Value<Mean<Upper value


H0 Sales is not effected by discount provided
1 Sales is effected by discount provided

The significant value or calculated value ‘p’ is greater than alpha value the null hypothesis is
accepted.
So here sales is not effected by discount provided and sales is not having any impact in

discount provided.

T-Test
TWO SAMPLE T TEST
A two-sample t-test always uses the following null hypothesis:
 H0: μ1 = μ2 (the two population means are equal)

The alternative hypothesis can be either two-tailed, left-tailed, or right-tailed:

 H1 (two-tailed): μ1 ≠ μ2 (the two population means are not equal)

4|Page
 H1 (left-tailed): μ1 < μ2 (population 1 mean is less than population 2 mean)
 H1 (right-tailed): μ1> μ2 (population 1 mean is greater than population 2 mean)

We use the following formula to calculate the test statistic t:

Test statistic: (x 1 – x 2)  /  sp(√1/n 1  + 1/n 2)

where x 1 and x 2 are the sample means, n1 and n2 are the sample sizes, and where sp is
calculated as standard deviation of population mean.

The following presumptions must be true in order for the findings of a two sample t-test to be reliable:

The observations in one sample ought to be separate from those in the other.
The information ought to be roughly regularly distributed.
The variance of the two samples should be roughly equal. If this supposition is incorrect, you ought to
carry out a Welch's t-test.
Using a random sampling technique, the data for the two samples should be collected.

The table below shows the t test of the data taken with equal variances and its obtained values.

t-Test: Two-Sample Assuming Equal Variances

Net Sales calculated Net Sales calculated


Mean 61088.05369 60341.57813
Variance 2841768398 2714324610
Observations 149 128
Pooled Variance 2782912539
Hypothesized Mean Difference 0

5|Page
df 275
t Stat 0.117415127
P(T<=t) one-tail 0.453308403
t Critical one-tail 1.650413433
P(T<=t) two-tail 0.906616806
t Critical two-tail 1.968627871

ONE SAMPLE T -TEST: The One Sample t Test examines whether the mean of a
population is statistically different from a known or hypothesized value. The One
Sample t Test is a parametric test.

This test is also known as:

 Single Sample t Test

The variable used in this test is known as:

 Test variable

The below table shows the t -test for two sample for equal variances.

t-Test: Two-Sample Assuming Equal Variances

Amount to Customer Amount to Customer


Mean 61949.71141 60341.57813
Variance 2968377426 2714324610
Observations 149 128
Pooled Variance 2851051217
Hypothesized Mean Difference 0
df 275
t Stat 0.249906623
P(T<=t) one-tail 0.401423071
t Critical one-tail 1.650413433

6|Page
P(T<=t) two-tail 0.802846142
t Critical two-tail 1.968627871

INFERENCES:
The significant value or calculated value p is greater than alpha value the null hypothesis is
accepted. So here sales is not effected by discount provided and sales is not having any
impact in discount provided.

ANOVA
With the help of the statistical analysis approach known as ANOVA, apparent aggregate
variability within a data set is explained by separating systematic components from random
factors. Systematic influences, but not random ones, statistically affect the data set that is
being presented. Analysts use the ANOVA test to assess how independent variables in a
regression study affect the dependent variable. The t- and z-test procedures developed in the
20th century were used for statistical analysis up until 1918, when Ronald Fisher created the
analysis of variance approach. The analysis of variance (ANOVA), also referred to as the
Fisher analysis of variance, extends the t- and z-tests. The statement became well-known in
1925 after it appeared in Fisher's book "Statistical Methods for Research Workers." Prior to
being used on more intricate subjects, it was utilised in experimental psychology.

7|Page
Anova: Two-Factor
Without Replication

SUMMARY Count Sum Average Variance


Row 1 2 33583 16791.5 359146800.5
Row 2 2 25913 12956.5 224359744.5
Row 3 2 884 442 223112
Row 4 2 11762 5881 25332962
Row 5 2 40952 20476 86250978
Row 6 2 19098 9549 98757458
Row 7 2 2948 1474 282752

Column 1 7 109425 15632.14286 142457306.1


Column 2 7 25715 3673.571429 21509322.29

ANOVA
Source of Variation SS df MS F P-value F crit
Rows 689971970.7 6 114995328.5 2.348218824 0.16132 4.283866
Columns 500526007.1 1 500526007.1 10.22080295 0.018671 5.987378
Error 293827799.9 6 48971299.98

Total 1484325778 13

INFERENCES:

STEP 1: Setting the Hypothesis

H0- Null Hypothesis – zone does not affect the net sales of cosmetics

H1- Alternative Hypothesis – zone affects the net sales

Hence, the numbers of independent values that can vary in an analysis among groups
and within groups are 4 and 9 respectively.

STEP 2: Anova Single Factor


P-value > LOS

8|Page
STEP 3 – RESULT
Since the Obtained P value is greater than the LOS value, therefore H0 is accepted.

INTERPRETATION:

 Ho is accepted
 There is the statistical significance that proves that zone does not produce any impact on the
net sales and both are independent of each other.

OVERALL INFERENCE FROM ANOVA TEST

H0 is accepted and H1 is rejected.

There is no significant difference between the zone and net sales we can say that very little
difference between the averages of each groups, therefore there is no statistical significance
to prove that the average age affects the net sales.

CHI SQUARE
A chi-squared test is essentially a data analysis based on observations of a random set of
variables (symbolized as 2). Typically, it involves a contrast between two sets of statistical
data. Karl Pearson developed this test in 1900 for the analysis and distribution of categorical
data. As a result, Pearson's chi-squared test was cited.

By assuming that the null hypothesis is true, the chi-square test is used to determine how
likely the observations would be.

A hypothesis is a possibility that a certain condition or a certain statement is true, which we

9|Page
can then test. A sum of squared errors over the sample variance is typically used to produce
chi-squared tests.

The chi-square test has the following characteristics:

The variance is equal to the degrees of freedom times two.


The average distribution is equal to the number of degrees of freedom.
When the degree of freedom rises, the chi-square distribution curve resembles the normal
distribution.

FORMULA: The chi-squared test is done to check if there is any difference between the observed
value and expected value. The formula for chi-square can be written as;

CHI SQURE TABLE AND CALCULATIONS

Observed values Expected Values


Sum of Column Sum of Column
Price Labels Price Labels
Row Labels 80 100 Grand Row Labels 80 100 Grand
Total Total
EAST 30192 3391 33583 EAST 27192.6874 6390.3126 33583
NORTH 23548 2365 25913 NORTH 20982.16683 4930.8332 25913
North 1 776 108 884 North 1 715.7888116 168.21119 884
NORTH2 9440 2322 11762 NORTH2 9523.87783 2238.1222 11762
SOUTH 27043 13909 40952 SOUTH 33159.4835 7792.5165 40952
WEST 16576 2522 19098 WEST 15463.95331 3634.0467 19098
WEST2 1850 1098 2948 WEST2 2387.042326 560.95767 2948
Grand 109425 25715 135140 Grand 109425 25715 135140
Total Total 10 | P a g e
Observered (O) Eexpected {E} O-E (O-E)^2 ((O-E)^2)/E

30192.00 27192.69 2999.31 8995876.08 330.82


3391.00 6390.31 -2999.31 8995876.08 1407.74
23548.00 20982.17 2565.83 6583499.87 313.77
2365.00 4930.83 -2565.83 6583499.87 1335.17
776.00 715.79 60.21 3625.39 5.06
108.00 168.21 -60.21 3625.39 21.55
9440.00 9523.88 -83.88 7035.49 0.74
2322.00 2238.12 83.88 7035.49 3.14
27043.00 33159.48 -6116.48 37411370.39 1128.23
13909.00 7792.52 6116.48 37411370.39 4800.94
16576.00 15463.95 1112.05 1236647.85 79.97
2522.00 3634.05 -1112.05 1236647.85 340.29
1850.00 2387.04 -537.04 288414.46 120.83
1098.00 560.96 537.04 288414.46 514.15
SUM = 10402.39

INFERENCES

STEP – 1 : SETTING THE HYPOTHESIS

H0- Null Hypothesis – The location or zone and their product tastes are independent

H1- Alternative Hypothesis – The location or zone and their product sales re
dependent on each other

Level of Significance = 0.05; Level of Confidence – 0.95

STEP – 2: CALCULATIONS

STEP – 3: DEGREE OF FREEDOM

Degree of Freedom = (r-1) * (c-1)

= (3-1) * (2-1)
=2*1
=2
Tabulated value = 7.35
STEP – 4: RESULT

Since, Χ² value is greater than table value. Therefore, H0 is rejected

The chi-square statistic is 84.8437. The p-value is the result is significant at p < .05

11 | P a g e
INTREPRETATION:

 Ho is rejected; H1 is accepted
 There is statistical significance which proves that zone has an effect on net sales
 Zone affects the prevalence of choosing a product, net sales depend on chance of
selecting accessories.

REGRESSION
Regression is a statistical technique used in the fields of finance, investing, and other
disciplines that aims to establish the nature and strength of the relationship between a single
dependent variable (often represented by Y) and a number of independent variables (known
as independent variables).

The most popular variation of this method is linear regression, which is also known as simple
regression or ordinary least squares (OLS). Based on a line of best fit, linear regression
determines the linear relationship between two variables.

FORMULA OF REGRESSION:
Y_i = f(X_i, \beta)+e_i
Y_i = dependent variable

12 | P a g e
f = function
X_i = independent variable
\beta = unknown parameters
e_i = error terms

The below table examine the summary output for the regression statistics performed on the
taken tata which is there on the regression spreadsheet.
For any clarification and detail check kindly refer to excel data.

SUMMARY
OUTPUT

Regression Statistics
Multiple R 0.17556947
5
R Square 0.03082464
1
Adjusted R Square 0.02985352
3
Standard Error 33.4934409
Observations 1000

ANOVA
df SS MS F Significance
F
Regression 1 35607.84847 35607.8484 31.74140894 2.29021E-08
7

13 | P a g e
Residual 998 1119566.962 1121.81058
3
Total 999 1155174.81

Coefficients Standard Error t Stat P-value Lower 95% Upper 95%


Intercept 226.704274 1.603471604 141.383404 0 223.5577119 229.8508372
6 6
Qty 0.01892208 0.003358581 5.63395145 2.29021E-08 0.01233139 0.025512771

INFERENCES
The significance value is greater than alpha so according to the thumb rule null hypothesis is
accepted. The null hypothesis (Ho: There is no impact of unit Order quantity) is
ACCEPTED.
Hypothesis:
H0: There is no significant relationship between price and Order quantity.
H1: There is a significant relationship between price and order quantity.
Involvement
Regression: regression analysis helps to know whether there is a significant relationship
between the independent and variable. The P-value helps to indicate and test the hypothesis
relationship.
The p-value for each term is used to test the null hypothesis. A low p-value (0.05) suggests
that the null hypothesis can be rejected. In other words, because changes in the predictor
value are related to changes in the response variable, a predictor with a low p-value is likely
to be a useful addition to your model. A larger (insignificant) p-value, on the other hand,
indicates that changes in the predictor are unrelated to changes in the response.
From the above regression output between zone and net sales are affected because the
significance value is 3.408 which is above (alpha = 0.05). According to the thumb rule, if
the Significant p-value is greater than the alpha value which is 0.05) it indicates that there
is no significant relationship between price and order quantity

14 | P a g e
CORELATION
For describing straightforward links between data, correlations are helpful. Consider a dataset
of campgrounds in a park in the mountains as an illustration. You're interested in finding out
if the height of the campsite—how high up the mountain it is—and the summer's typical high
temperature are related.

You can measure height and temperature for each individual campground. There is a linear
relationship between these two variables when you compare them across your sample using a
correlation: the temperature decreases as elevation rises. They have a bad correlation.

15 | P a g e
Qty MRP
Qty 1
MRP 0.17556 1
9

INFERENCES:
H1: There is a significant relationship between Unit Price and order quantity
HO: There is no relationship between Unit Price and Order Quantity
R value = 0.946373
The correlation helps to indicate the positive, negative, or no correlation between the independent and
dependent variables. The Pearson correlation value indicates the correlation between the variables.
The Pearson correlation value lies between -1 0 and 1.

If the correlation value is negative and between -0.6 and -0.9 It indicates a moderately negative
correlation. If it is between -0.3 and -0.1 it indicates a weakly negative correlation, if it is above -0.6
and till .9 it is strongly negatively correlated to each other.

The Pearson correlation value is 0 which indicates there is no correlation between the two

Variables and one does not impact the other. If the Pearson correlation value is between

0.5 And 0.7 it is moderately correlated, if it is less than 0.5 it. Indicates a weakly positive correlation,
if it is above 0.5 and till 1 it indicates a strong positive correlation between the independent and
dependent variables.

Since R value is between 0.6 and 0.9 we can conclude that the correlation is moderately correlated

16 | P a g e

You might also like