Professional Documents
Culture Documents
• Descriptive Statistics
• Inferential Statistics
Vishal Mishra (IBS, Hyderabad)
BASICS
Descriptive Statistics
- Qualitative
- Quantitative
Vishal Mishra (IBS, Hyderabad)
BASICS
Graphical, Tabular, Numeric representation of Data
Frequency Distribution
Relative Frequency Distribution
Cross-tabulation
Vishal Mishra (IBS, Hyderabad)
BASICS
Graphical, Tabular, Numeric representation of Data
Quantitative Data
Frequency Distribution
Relative Frequency Distribution
Cumulative Frequency Distribution
Stem and Leaf Display
Cross-tabulation
BASICS
Quantitative Data: Numeric Representation
Measures of Variability:
Range
Inter-quartile Range
Standard Deviation
Variance
Coefficient of Variation
Vishal Mishra (IBS, Hyderabad)
BASICS
Qualitative Data: Bar Graph
Colour Preference
16
14
12
10
BASICS
Qualitative Data: Pie Chart
BASICS
Qualitative Data: Pie Chart
Colour Preference
Vishal Mishra (IBS, Hyderabad)
BASICS
Qualitative Data: Frequency Distribution
Colour Frequency
Blue 10
Red 14
Green 6
Vishal Mishra (IBS, Hyderabad)
BASICS
Qualitative Data: Relative Frequency Distribution
Blue 10 0.3333
Red 14 0.4667
Green 6 0.2
Total 30 1
Vishal Mishra (IBS, Hyderabad)
BASICS
Qualitative Data: Cross-Tabulation
SATISFACTION
NO YES
S/W Engineer 6 4
JOB Bank Cashier 3 3
Carpenter 4 10
Vishal Mishra (IBS, Hyderabad)
Number of classes ?
BASICS
Quantitative Data: Frequency Distribution
BASICS
Quantitative Data: Frequency Distribution
BASICS
Quantitative Data: Relative Frequency Distribution
42 to 50 3 0.1 (<50): 1
Total = 30 Total = 1
Vishal Mishra (IBS, Hyderabad)
BASICS
Quantitative Data: Histogram
Frequency
12
10
0
Less than 10 10 to 18 18 to 26 26 to 34 34 to 42 42 to 50
Vishal Mishra (IBS, Hyderabad)
BASICS
Quantitative Data: Cumulative Frequency Distribution
10 to 18 5 5
18 to 26 5 10
26 to 34 10 20
34 to 42 7 27
42 to 50 3 30
Vishal Mishra (IBS, Hyderabad)
BASICS
Quantitative Data: Ogive
Cumulative Frequency
35
30
25
20
15
10
5
0
Less than 10 Less than 18 Less than 26 Less than 34 Less than 42 Less than 50
Vishal Mishra (IBS, Hyderabad)
BASICS
BASICS
Quantitative Data: Frequency Distribution
10-15 3
15-20 4
20-25
25-30
30-35
35-40
40-45
45-50
Question: What will happen if number of classes are too many OR too little?
Vishal Mishra (IBS, Hyderabad)
BASICS
Quantitative Data: Numeric Summary
Measures of Variability:
Range, IQR, Standard Deviation (S.D), Variance, Coefficient of variation
Vishal Mishra (IBS, Hyderabad)
BASICS
Quantitative Data: Measures of Location OR Central Tendency
Quartiles/Percentiles:
Quartiles are those values that divide the data into 4 parts (Q1, Q2, Q3).
Percentiles are those values that divide the data into 100 parts
Pth percentile is a value such that at least P % of the values are <= this
value and at least (100-P) % of the values are >= this value
Vishal Mishra (IBS, Hyderabad)
BASICS
Quantitative Data: Measures of Location OR Central Tendency
Quartiles/Percentiles:
i = (p/100)*n
BASICS
Quantitative Data: Measures of Location OR Central Tendency
Now since i is an integer, Q2 is the mean of observations in the ith and (i+1) th
position (when the data is arranged in ascending order).
i.e., Q2 = (27+31)/2
Thus Q2 or Median is 29
Vishal Mishra (IBS, Hyderabad)
BASICS
Quantitative Data: Measures of Location OR Central Tendency
i = (70/100)*30 = 21
Now since i is an integer, 70th percentile is the mean of observations in the ith
and (i+1) th position (when the data is arranged in ascending order).
= (34+34)/2 = 34
BASICS
Quantitative Data: Measures of Variability
Here Q1 = ? and Q3 = ?
Thus IQR = ?
Vishal Mishra (IBS, Hyderabad)
BASICS
Quantitative Data: Measures of Variability
BASICS
Quantitative Data: Measures of Variability
Population Variance: ( x − ) 2
2 = i
N
=?
Population S.D = σ = ?
2 ( xi − x )
2
Sample Variance: s =
n −1
=?
BASICS
Quantitative Data: Measures of Variability
2 ( xi − x )
2
Sample 4 15
s =
n −1 5 17
Vishal Mishra (IBS, Hyderabad)
BASICS
Quantitative Data: Measures of Variability
BASICS
Quantitative Data: Measures of Variability
Example: Calculate variance of this data of 5 observations
Population ( x − ) 2
2 = i
N S. No. Marks Mean (Marks- Mean)^2
i
( x − x ) 2 1 10 13.6 12.96
Sample s2 = 2 12 13.6 2.56
n −1 3 14 13.6 0.16
4 15 13.6 1.96
Numerator = 29.2 5 17 13.6 11.56
BASICS
Quantitative Data: Measures of Variability: Data of marks of 30 students
( x − ) 2
Population Variance: 2 = i
N
= 93.85,
(Numerator, Sum of Squared Difference from Mean: ∑(Xi - µ)2 = 2815.5)
2 ( xi − x )
2
Sample Variance: s =
n −1
= 97.09,
BASICS
Quantitative Data: Measures of Variability
BASICS
Quantitative Data: Measures of Variability
BASICS
Quantitative Data: Measures of Variability
BASICS
Relationship Between Variables
Quantitative Data:
Cross-tabulation (Tabular)
BASICS
Relationship Between Variables Marks Scored
25
Quantitative Data:
20
10
0
0 2 4 6 8 10 12 14 16
Vishal Mishra (IBS, Hyderabad)
BASICS
Relationship Between Variables Marks Scored
25
Quantitative Data:
20
10
0
0 2 4 6 8 10 12 14 16
Vishal Mishra (IBS, Hyderabad)
BASICS
Relationship Between Variables
Quantitative Data:
( xi − x )( yi − y )
Sample: sxy =
n −1
( xi − x )( yi − y )
Population:
xy =
N
Vishal Mishra (IBS, Hyderabad)
BASICS
Relationship Between Variables
BASICS
Relationship Between Variables
BASICS
Relationship Between Variables
BASICS
Relationship Between Variables
Quantitative Data:
Drawback of Covariance
Correlation - Degree of linear association
sxy
Sample: rxy =
sx s y
Population: xy
xy =
x y
Vishal Mishra (IBS, Hyderabad)
BASICS
Relationship Between Variables
Quantitative Data:
Correlation - Degree of linear association
Population Variance of X = 18.01; Sample Variance = 20.01
Population Variance of Y = 29.89 ; Sample Variance = 33.21
sxy
rxy =
Sample Corr.: sx s y
: (24.19) / (20.01 * 33.21)
: 0.9384
xy
Population Corr. : =
xy
x y
Vishal Mishra (IBS, Hyderabad)
BASICS S. No.
1
Marks Scored
9
Hours Spend on Social Media
8
2 10 8
3 11 7
Relationship Between Variables
4 11 6
5 17 6
Quantitative Data: 6 19 7
7 19 2
8 20 5
Cross-tabulation 9 22 5
10 25 4
11 26 5
12 26 6
13 27 4
14 28 2.5
15 29 4
16 31 2.5
17 32 2.5
18 32 2
19 33 2
20 33 2
Vishal Mishra (IBS, Hyderabad)
BASICS
Relationship Between Variables
Quantitative Data:
Hours on Social Media
Cross-tabulation
0 to 3 3 to 6 6 to 9
0 to 12 0 0 4
Marks Scored 12 to 24 1 2 2
24 to 36 6 4 1
Vishal Mishra (IBS, Hyderabad)
BASICS
Detecting Outliers
(The extreme values that are either very low OR very high)
BASICS
Detecting Outliers
1. Maximum value
2. Minimum value
3. Q1
4. Q2
5. Q3
Vishal Mishra (IBS, Hyderabad)
1. Maximum value 7 19 22 34
2. Minimum value 8 20 23 35
3. Q1 9 22 24 35
4. Q2 10 25 25 36
5. Q3 11 26 26 38
12 26 27 40
13 27 28 44
14 27 29 45
15 27 30 75
Vishal Mishra (IBS, Hyderabad)
1. Maximum value = 75 7 19 22 34
2. Minimum value = 10 8 20 23 35
3. Q1 = 20 9 22 24 35
4. Q2 = 29 10 25 25 36
5. Q3 = 35 11 26 26 38
12 26 27 40
IQR = 35-20 = 15 13 27 28 44
14 27 29 45
15 27 30 75
Vishal Mishra (IBS, Hyderabad)
BASICS
Detecting Outliers
Outliers are the values that are below Q1-1.5 (IQR) or that are above Q3 + 1.5
(IQR)
This implies that there is one outlier in this data set. That outlier is the
observation with a value of 75
Vishal Mishra (IBS, Hyderabad)
BASICS
Detecting Outliers
Box Plot
Vishal Mishra (IBS, Hyderabad)
BASICS
Detecting Outliers
Box Plot
Vishal Mishra (IBS, Hyderabad)
BASICS
Detecting Outliers (when data is symmetrically distributed)
For a sample, the Z-score for the ith observation (i.e. Xi) is calculated as,
𝑥𝑖 −𝑥ҧ
𝑧𝑖 =
𝑠
where x is sample mean and s is sample standard deviation
Vishal Mishra (IBS, Hyderabad)
BASICS
Detecting Outliers
S.No. Marks Scored (Xi)
Numeric Method: Z-Score
1 4
2 9
3 8
4 6
5 10
6 6
7 20
8 5
9 18
10 3
Vishal Mishra (IBS, Hyderabad)
BASICS
Detecting Outliers S.No. Marks Scored (Xi) Z-score (Zi)
1 4 -0.850694
Numeric Method: Z-Score 2 9 0.0173611
3 8 -0.15625
4 6 -0.503472
5 10 0.1909722
6 6 -0.503472
7 20 1.9270833
8 5 -0.677083
9 18 1.5798611
10 3 -1.024306
Mean 8.9
Standard Deviation 5.76
Vishal Mishra (IBS, Hyderabad)
BASICS
Detecting Outliers
Values with z-score below -3 and above +3 are termed as outliers (extreme
values)