Professional Documents
Culture Documents
SUBMITTED BY
REGISTER NUMBER
2227046
SUBMITTED TO
Dr. Ramanathan
CLASS
1-MBA-A
1|Page
TABLE OF CONTENT
Sr. No. Topic Page No.
1 Interval Estimation 3
2 T-test 5
3 Anova 8
4 Chi Square 10
5 Regression 13
6 Corelation 16
SOURCE: www.kaggle.com
2|Page
INTERVAL ESTIMATION
The most fundamental point and interval estimate method includes the estimation of a
population mean. Consider the case when calculating the population mean for a numerical
variable would be helpful. A simple random sample can be used to gather data to determine
the sample mean, or x, and its value is used as a point estimate.
When using the sample mean as a point estimate of the overall mean, some error can be
expected because a sample, or subset of the population, was used to compute the point
estimate. The absolute value of the difference between the population mean and the sample
mean, represented by the symbol |x|, is the sampling error. A 95% confidence interval
estimate for the population mean in the large-sample situation is given x̄ ± 1.96σ/Square root
of √n. The sample standard deviation is utilized in the confidence interval formula to estimate
when the population standard deviation, is unknown.
Mean 52316.86
Standard Error 1516.64
Median 36643.00
Mode 12283.00
Standard Deviation 47960.22
Sample Variance 2300182636.09
Kurtosis 3.66
Skewness 1.77
Range 325170.00
Minimum 295.00
Maximum 325465.00
Sum 52316859.00
Count 1000.00
The above table depicts the statistical values for the interval estimation for the net sales data
taken.
3|Page
μ=x¯±Zα2σn√
Upper Value = 55289.46
=52316.86 + (1.96*47960.22)
=55289.46
The significant value or calculated value ‘p’ is greater than alpha value the null hypothesis is
accepted.
So here sales is not effected by discount provided and sales is not having any impact in
discount provided.
T-Test
TWO SAMPLE T TEST
A two-sample t-test always uses the following null hypothesis:
H0: μ1 = μ2 (the two population means are equal)
4|Page
H1 (left-tailed): μ1 < μ2 (population 1 mean is less than population 2 mean)
H1 (right-tailed): μ1> μ2 (population 1 mean is greater than population 2 mean)
where x 1 and x 2 are the sample means, n1 and n2 are the sample sizes, and where sp is
calculated as standard deviation of population mean.
The following presumptions must be true in order for the findings of a two sample t-test to be reliable:
The observations in one sample ought to be separate from those in the other.
The information ought to be roughly regularly distributed.
The variance of the two samples should be roughly equal. If this supposition is incorrect, you ought to
carry out a Welch's t-test.
Using a random sampling technique, the data for the two samples should be collected.
The table below shows the t test of the data taken with equal variances and its obtained values.
5|Page
df 275
t Stat 0.117415127
P(T<=t) one-tail 0.453308403
t Critical one-tail 1.650413433
P(T<=t) two-tail 0.906616806
t Critical two-tail 1.968627871
ONE SAMPLE T -TEST: The One Sample t Test examines whether the mean of a
population is statistically different from a known or hypothesized value. The One
Sample t Test is a parametric test.
Single Sample t Test
Test variable
The below table shows the t -test for two sample for equal variances.
6|Page
P(T<=t) two-tail 0.802846142
t Critical two-tail 1.968627871
INFERENCES:
The significant value or calculated value p is greater than alpha value the null hypothesis is
accepted. So here sales is not effected by discount provided and sales is not having any
impact in discount provided.
ANOVA
With the help of the statistical analysis approach known as ANOVA, apparent aggregate
variability within a data set is explained by separating systematic components from random
factors. Systematic influences, but not random ones, statistically affect the data set that is
being presented. Analysts use the ANOVA test to assess how independent variables in a
regression study affect the dependent variable. The t- and z-test procedures developed in the
20th century were used for statistical analysis up until 1918, when Ronald Fisher created the
analysis of variance approach. The analysis of variance (ANOVA), also referred to as the
Fisher analysis of variance, extends the t- and z-tests. The statement became well-known in
1925 after it appeared in Fisher's book "Statistical Methods for Research Workers." Prior to
being used on more intricate subjects, it was utilised in experimental psychology.
7|Page
Anova: Two-Factor
Without Replication
ANOVA
Source of Variation SS df MS F P-value F crit
Rows 689971970.7 6 114995328.5 2.348218824 0.16132 4.283866
Columns 500526007.1 1 500526007.1 10.22080295 0.018671 5.987378
Error 293827799.9 6 48971299.98
Total 1484325778 13
INFERENCES:
H0- Null Hypothesis – zone does not affect the net sales of cosmetics
Hence, the numbers of independent values that can vary in an analysis among groups
and within groups are 4 and 9 respectively.
8|Page
STEP 3 – RESULT
Since the Obtained P value is greater than the LOS value, therefore H0 is accepted.
INTERPRETATION:
Ho is accepted
There is the statistical significance that proves that zone does not produce any impact on the
net sales and both are independent of each other.
There is no significant difference between the zone and net sales we can say that very little
difference between the averages of each groups, therefore there is no statistical significance
to prove that the average age affects the net sales.
CHI SQUARE
A chi-squared test is essentially a data analysis based on observations of a random set of
variables (symbolized as 2). Typically, it involves a contrast between two sets of statistical
data. Karl Pearson developed this test in 1900 for the analysis and distribution of categorical
data. As a result, Pearson's chi-squared test was cited.
By assuming that the null hypothesis is true, the chi-square test is used to determine how
likely the observations would be.
9|Page
can then test. A sum of squared errors over the sample variance is typically used to produce
chi-squared tests.
FORMULA: The chi-squared test is done to check if there is any difference between the observed
value and expected value. The formula for chi-square can be written as;
INFERENCES
H0- Null Hypothesis – The location or zone and their product tastes are independent
H1- Alternative Hypothesis – The location or zone and their product sales re
dependent on each other
STEP – 2: CALCULATIONS
= (3-1) * (2-1)
=2*1
=2
Tabulated value = 7.35
STEP – 4: RESULT
The chi-square statistic is 84.8437. The p-value is the result is significant at p < .05
11 | P a g e
INTREPRETATION:
Ho is rejected; H1 is accepted
There is statistical significance which proves that zone has an effect on net sales
Zone affects the prevalence of choosing a product, net sales depend on chance of
selecting accessories.
REGRESSION
Regression is a statistical technique used in the fields of finance, investing, and other
disciplines that aims to establish the nature and strength of the relationship between a single
dependent variable (often represented by Y) and a number of independent variables (known
as independent variables).
The most popular variation of this method is linear regression, which is also known as simple
regression or ordinary least squares (OLS). Based on a line of best fit, linear regression
determines the linear relationship between two variables.
FORMULA OF REGRESSION:
Y_i = f(X_i, \beta)+e_i
Y_i = dependent variable
12 | P a g e
f = function
X_i = independent variable
\beta = unknown parameters
e_i = error terms
The below table examine the summary output for the regression statistics performed on the
taken tata which is there on the regression spreadsheet.
For any clarification and detail check kindly refer to excel data.
SUMMARY
OUTPUT
Regression Statistics
Multiple R 0.17556947
5
R Square 0.03082464
1
Adjusted R Square 0.02985352
3
Standard Error 33.4934409
Observations 1000
ANOVA
df SS MS F Significance
F
Regression 1 35607.84847 35607.8484 31.74140894 2.29021E-08
7
13 | P a g e
Residual 998 1119566.962 1121.81058
3
Total 999 1155174.81
INFERENCES
The significance value is greater than alpha so according to the thumb rule null hypothesis is
accepted. The null hypothesis (Ho: There is no impact of unit Order quantity) is
ACCEPTED.
Hypothesis:
H0: There is no significant relationship between price and Order quantity.
H1: There is a significant relationship between price and order quantity.
Involvement
Regression: regression analysis helps to know whether there is a significant relationship
between the independent and variable. The P-value helps to indicate and test the hypothesis
relationship.
The p-value for each term is used to test the null hypothesis. A low p-value (0.05) suggests
that the null hypothesis can be rejected. In other words, because changes in the predictor
value are related to changes in the response variable, a predictor with a low p-value is likely
to be a useful addition to your model. A larger (insignificant) p-value, on the other hand,
indicates that changes in the predictor are unrelated to changes in the response.
From the above regression output between zone and net sales are affected because the
significance value is 3.408 which is above (alpha = 0.05). According to the thumb rule, if
the Significant p-value is greater than the alpha value which is 0.05) it indicates that there
is no significant relationship between price and order quantity
14 | P a g e
CORELATION
For describing straightforward links between data, correlations are helpful. Consider a dataset
of campgrounds in a park in the mountains as an illustration. You're interested in finding out
if the height of the campsite—how high up the mountain it is—and the summer's typical high
temperature are related.
You can measure height and temperature for each individual campground. There is a linear
relationship between these two variables when you compare them across your sample using a
correlation: the temperature decreases as elevation rises. They have a bad correlation.
15 | P a g e
Qty MRP
Qty 1
MRP 0.17556 1
9
INFERENCES:
H1: There is a significant relationship between Unit Price and order quantity
HO: There is no relationship between Unit Price and Order Quantity
R value = 0.946373
The correlation helps to indicate the positive, negative, or no correlation between the independent and
dependent variables. The Pearson correlation value indicates the correlation between the variables.
The Pearson correlation value lies between -1 0 and 1.
If the correlation value is negative and between -0.6 and -0.9 It indicates a moderately negative
correlation. If it is between -0.3 and -0.1 it indicates a weakly negative correlation, if it is above -0.6
and till .9 it is strongly negatively correlated to each other.
The Pearson correlation value is 0 which indicates there is no correlation between the two
Variables and one does not impact the other. If the Pearson correlation value is between
0.5 And 0.7 it is moderately correlated, if it is less than 0.5 it. Indicates a weakly positive correlation,
if it is above 0.5 and till 1 it indicates a strong positive correlation between the independent and
dependent variables.
Since R value is between 0.6 and 0.9 we can conclude that the correlation is moderately correlated
16 | P a g e