Professional Documents
Culture Documents
Basic Statistics - All Calculations
Basic Statistics - All Calculations
Mean
Median
Mode
Measure of Dispersion/Spread
Range
Min
Max
Variance
Std Deviation
Measure of Shape
Skewness
Kurtosis
Mean
62 58
4 52
83 43
24 29
40 69 Average 49 (SUM(A3:A21))/(COUNT(A3:A21))
73 7 49
87 26 Average is the same as mean
2 14
54 65
5 69
74 93
83 65
87 33
54 61
94 50
26 6
14 24
9 96
56 0
58
Median
Odd count Even Count
62 58 Always sort the data in ascending order
4 52 Find the count, and divide the count by 2, to the the index value of the median data point
83 43 Check the value at that index in the data.
24 29
40 69
73 7 62 2
87 26 4 4
2 14 83 5
54 65 24 9
5 69 40 14 (COUNT(E32:E50))/2 10
74 93 73 24
83 65 87 26 The value at the 10th index i
87 33 2 40
54 61 54 54 50% count of the data will be
94 50 5 54 10th index
26 6 74 56
14 24 83 62
9 96 87 73
56 0 54 74
58 94 83
26 83
14 87
9 87
56 94
58 0
52 6
43 7 For even number of observations, we have 2 centr
29 14
69 24 COUNT(E54:E73)/2
7 26 (COUNT(E54:E73)/2)+1
26 29
14 33 Find the 10 & 11 index value in the data
65 43
69 50 10th index Average of these 2 values will be the median
93 52 11th index
65 58 (50+52)/2 51
33 58
61 61 So the median here is 51
50 65
6 65
24 69
96 69
0 93
58 96
Mode
0
1
2
4
9 84 Most frequent value in the data is the Mode value
20
27
28
30
32
33
40
41
45
54
59
61
66
75
84
84
87
96
A21))/(COUNT(A3:A21))
50% count of the data will be either equal to or lesser than the median value
observations, we have 2 centre values:
10
10 11 5
11 11 6
12
ex value in the data 12 5th
13 6th 12.5
alues will be the median 13
14 12.5
14
16
Range
58
52
43 Range is the difference between maximum value & minimum value
29
69 96 MAX(A3:A22)-MIN(A3:A22)
7
26 Therefore our data could be any of the 96 values between the min & the
14
65
69
93
65
33
61
50
6
24
96
0
58
Mean 45.9
14666 SUM(D27:D46)
19 COUNT(D27:D46)-1
Varianc771.88 D49/D50
Delivery boy 1 (Time in minutes) – 12,13,17,21,24, 24, 26,27, 27, 30, 32, 35, 37, 38, 41,
Delivery boy 2 (Time in minutes)- 34, 14, 31, 59, 11, 50, 27, 33, 53, 34, 13, 13, 42, 29, 33
Delivery No. Boy 1 Boy 2
1 12 34
2 13 14
3 17 31
4 21 59
5 24 11
6 24 50
7 26 27
8 27 33
9 27 53
10 30 34
11 32 13
12 35 13
13 37 42
14 38 29
15 41 33
16 43 42
17 44 34
18 46 33
19 53 44
20 60 21
0 2 10 22 33 57 82 115 Mean
0 2 10 22 34 58 82 117 Std Dev
0 2 11 22 35 59 86 118
0 2 11 23 35 61 86 123
0 3 12 23 37 62 87 127 Empirical Rule
0 3 12 23 37 63 91 128 Approx. 67-68% of t
0 3 13 24 37 64 94 133
0 4 14 25 38 66 99 136 2.7286
0 5 15 26 38 66 100 139
0 6 15 27 40 66 100 183 Approx. 95% of the
1 6 16 28 43 68 102
1 7 16 28 44 68 102 -37.96
1 7 18 30 46 68 105
1 8 18 31 48 71 106 Approx. 99.7% of th
1 8 18 31 49 77 107
2 8 19 31 53 77 107 -78.64
2 9 20 31 54 78 107
2 9 21 31 54 79 108
2 9 22 31 54 80 112 0
2 9 22 33 55 81 115 10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
180
190
5
6
minimum value 7
14
24 Range 91
26
etween the min & the m29
33
43
50
52
58
58
61
65
65
69
69
93
96
ivery boy to deliver the product.
34, 13, 13, 42, 29, 33, 42, 34, 33, 44, 21
, he is more consistent as compared to delivery boy 2, hence we should hire boy 1.
84.098 98 0.6533
Approx. 95% of the data is expected to be within 2 std deviations away from the mean
Approx. 99.7% of the data is expected to be within 3 std deviations away from the mean
Bin Frequency
0 10 Histogram
10 32 35
20 15 30
25
30 16 20
Frequency
40 17 15
Frequency
10
50 5 5
60 8 0
0 20 4 0 6 0 80 0 0 2 0 4 0 6 0 8 0 r e
70 10 1 1 1 1 1 Mo
80 6 Bin
90 6
100 5
110 8
120 5
130 3
140 3
150 0
160 0
170 0
180 0
190 1
More 0
from the mean
om the mean
Frequency
Frequency
80 16 20 Freq
20
100 11 10
10
0
120 13 0 0 20 40 60 8 0 0 0 2 0 4 0 6 0 8 0 0 0 r e
0 1 01 1 01 1 02 Mo
140 6 40 80 12 16 20
160 0 BinBin
180 0
200 1
More 0 Right Skewed Data
43.41333
31
0 Properties of Skewed Data
Normal Mode=Median=Mean
stogram
Frequency
Mode<Median<Mean
Mode>Median>Mean
Mode=Median=Mean
Bin Frequency Relative Frequency
0 10 0.066666666666667
20 47 0.313333333333333
40 33 0.22
60 13 0.086666666666667
80 16 0.106666666666667
100 11 0.073333333333333
120 13 0.086666666666667
140 6 0.04
160 0 0
180 0 0
200 1 0.006666666666667
More 150 1
Cummulative Frequency Cummulative Relative Frequency
10 0.066667
57 0.38
90 0.6
103 0.686667
119 0.793333
130 0.866667
143 0.953333
149 0.993333
149 0.993333
149 0.993333
150 1
Probability is chance of an event.
There are 2 types of probabilities: Discrete Probability Distribution & Continuous Probability Distribution
Dice
1 0.166667
2 0.166667 (Number of ways/outcomes where the event occurs) / (Total number of possible outcomes)
3 0.166667
4 0.166667
5 0.166667 P(Even number) 0.5 Probability will ALWAYS sum up to 1, and will lie in
6 0.166667 Where probability closer to 0 talks about less likely
1 P(Prime number) 0.5
P(x<6) 0.833333
0
Tossing a coin 10 times Head = 0.5 Tail = 0.5 Unlikely/
Impossible Events
0 0.000977
1 0.009766
2 0.043945
3 0.117188
4 0.205078 P(Less than 4 heads) 0.171875
5 0.246094
6 0.205078 P(7 or less) 0.945313
7 0.117188
8 0.043945 P(Greater than 6 heads) 0.171875
9 0.009766
10 0.000977 P(Greater than 2 heads) 0.945313
0 0.00243
1 0.02835
2 0.1323
3 0.3087
4 0.36015
5 0.16807
1
mber of possible outcomes)
Histogram
40
Bin Frequency 30
Frequency
4 0 20 Frequency
4.5 5 10
5 27 0
5.5 27 4 4.5 5 5.5 6 6.5 7 7.5 8 More
6 30 Bin
6.5 31
7 18
7.5 6
8 6
More 0
ity of a randomly selected flower to have a sepal length of less than 4.6?
100000
100 1000
(X-Mean)/Stdev
t-Stat (X-Mean)/(stdev/sqrt(N))
1990
2833
2500
140 df 139
211.2886
2409 95% times the average balance maintained will lie between $1571 & $2409
e Margin of Error= 1 - Confidence Level
239.4322
1.984
2465.03 95% times the average balance maintained will lie between $1514.97 & $2465.03
500
Frequency
e
Situation 1
Null Hypothesis: The person is innocent. Truth
The is no sufficient evidence to prove the person guilty, hence we will accept the null hypothesis & the person will b
Hence the decision made is correct.
Situation 2
Null Hypothesis: The person is innocent. 0
The is sufficient evidence to prove the person guilty, hence we will accept the alternate hypothesis & the person wi
Hence the decision made is correct.
Situation 3
Null Hypothesis: The person is innocent. Truth
The is evidence to prove the person guilty, hence we will accept the alternative hypothesis & the person will be pun
Hence are making a Type 1 Error.
Situation 4
Null Hypothesis: The person is innocent. 0
The is no sufficient evidence to prove the person guilty, hence we will accept the null hypothesis & the person will b
Hence are making a Type 2 Error.
rnate Hypothesis
y must not say 'Accept Null Hypothesis' , because the fact mentioned in Null Hypothesis is the default belief.
e is no sufficient evidence from the sample to prove the Null as wrong, we would rather say 'Fail to reject the Null'.
One tailed tests
Variables like:
Sales
Profitability
Scores in an exam etc
Which are beneficial when on the upside, and detrimental when on the downside.
Variables like:
Loss
Number of defects
Which are detrimental when on the upside, and beneficial when on the downside.
Which is good enough only when It is in a specified safe range, but detrimental when on either of the extremes, higher or lowe
of the extremes, higher or lower.
Sample Mean 1990
Sample SD 2833
Population SD 2500
Sample size 140
Sample SD 239
1990
1751 2229
1511.1355993 2468.8644007
z-score / t-test
If I pick a random employee, what is the probability that his salary is 35000.
t-stat 35000-40000
2000/sqrt(50)
-17.67766953
a 6
b 11
c 17
d 32
e -16
50 50
t 2.2360679775
df 79
ANOVA
Null Hypothesis : The difference between the 3 samples is almost the same, and the different diets have equal impact on weig
Alternate Hypothesis: Atleast 1 diet plan is significantly reducing weight than others
Average 4.5 77
SS Between Groups
3.5714285714 F-Stat
In Excel
Anova: Single Factor
SUMMARY
Groups Count Sum Average Variance
Atkins 12 54 4.5 7
GM 12 64 5.3333333333 5.1515151515
South Beach 12 84 7 4.1818181818
ANOVA
Source of Variation SS df MS F
Between Groups 38.888888889 2 19.444444444 3.5714285714
Within Groups 179.66666667 33 5.4444444444
Total 218.55555556 35
SUMMARY
Groups Count Sum Average Variance
Column 1 12 54 4.5 7
Column 2 12 64 5.3333333333 5.1515151515
Column 3 12 84 7 4.1818181818
ANOVA
Source of Variation SS df MS F
Between Groups 38.888888889 2 19.444444444 3.5714285714
Within Groups 179.66666667 33 5.4444444444
Total 218.55555556 35
40000
2000
<5% Alternate
>5% Null hypothesis
ANOVA
ent diets have equal impact on weight loss.
Shelf 1
Shelf 2
Shelf 3
Shelf 4 Highest Sales
Shelf 5
5.3333333333 56.666666667 7 46
SS Between Groups
2 kms
P-value F crit
0.0394405888 3.2849176511
P-value F crit
0.0394405888 3.284917651
Observed Frequencies
Non-smoker Smoker Total
Athlete 14 4 18
Non-athlete 0 10 10
Total 14 14 28
Expected Frequencies
Non-smoker Smoker Total
Athlete 9 9 18
Non-athlete 5 5 10
Total 14 14 28
Chi-Square-stat 15.56
Critical 3.84
Alternate Hypothesis The 2 variables are dependent & there is a statistically significant association between
Null Hypothesis The 2 variables are independent & there is no significant association between smoking
In Excel
0.0000801 P-Value
<0.05 P-value is less than 0.05, so reject null hypothesis.
y significant association between smoking and being a professional athlete
ant association between smoking and being a professional athlete