Professional Documents
Culture Documents
Class 1 - Descripritive Statistics
Class 1 - Descripritive Statistics
1
Basic Concept of Statistics
2
Types of statistics
• Descriptive statistics
– describes the data in some manner
– Summarization of data
E.g. Frequency distribution and Graphical
presentation
– Measure of central tendency and dispersion
Inferential statistics
(sampling statistics)
Example:
Age, sex, occupation, weight, height, blood
pressure, no. of admitted patients in ICU
Types of Variables
Variables
Qualitative Quantitative
or categorical or numerical
Binary Continuous
Multinomial
6
Categorical or Qualitative variables
Can’t be measured in number but divide into
categories
Gender : Male/Female
Test : Positive/Negative
Response :Yes/ No
Statistical analysis
Descriptive statistics: frequency table, cross
tabulation diagrams and graphs
Inferential statistics: Non parametric test e.g. Chi
square test
Quantitative variable
Measured in numbers or values or quantity
Example: no. of students in the class, ht., wt.
Discrete variable: whole number
Example: House hold size (1, 2,3,4,5,6 .…)
Continuous variable: measured in fractional
number
Marks, WBC, Ht., Wt., BP, cholesterol level etc.
Variables
Statistical analysis
Descriptive statistics:
Frequency tables, graphs, mean, median,
mode , range, standard deviations etc.
Inferential Statistics:
Parametric test: z test, t test, ANOVA test
etc.; correlation and regression analysis
10
Scaling of Data
Categorical Data
• Nominal Scale
• Ordinal Scale
Quantitative Data
• Interval Scale
• Ratio Scale
11
Possible Measures
Central tendency: Mode
Test of significance: Chi square test
12
Possible Measures
Central tendency: Mode and median
Variability: Quartile deviation
Correlation: Spearman rank correlation
13
Correlation and regression
T test, z test and ANOVA test 14
All statistical analysis
Parametric test
15
Example
SN Marks Rank Result
1 20 Tenth Fail
2 60 Fifth Pass
3 40 Eighth Fail
4 50 Seventh Pass
5 55 Sixth Pass
6 80 Second Pass
7 70 Fourth Pass
8 75 Third Pass
9 35 Ninth Fail
10 85 First Pass
Nominal Ratio scale Ordinal scale Nominal scale
scale
Summary level
Nomina Ordina Interva Rat
What to do?
l l l io
The sequence of variables is established – Yes Yes Yes
Mode Yes Yes Yes Yes
Median – Yes Yes Yes
Mean – – Yes Yes
Difference between variables can be
– – Yes Yes
evaluated
Addition and Subtraction of variables – – Yes Yes
Multiplication and Division of variables – – – Yes
Absolute zero – – – Yes
18
How to know variables from Performa?
• What is your occupation? – Nominal variable
(Agriculture/Business/Teacher)
• What is your blood group?-Nominal variable
(A/B/AB/O)
• Economic status of the respondent ? - Ordinal
(Lower/Middle/Higher)
• Likert scale –Ordinal Scale
(Strongly agree, agree, neutral, disagree and strongly disagree
19
Likert Scale
20
21
Ratio Scale
• Age at first pregnancy(years) : …………yrs
• Weight of child at birth : ……….Kg.
• Income: NRS………/$............
• Systolic blood pressure: ………….
22
Exercise
1. If the grading of diabetes is classified as mild, moderate and
severe the scale of measurement used is :
a. Interval
b. Nominal
c. Ordinal
d. Ratio
2. The nominal level of measurement is represented in which
variable below?
a. level of satisfaction
b. temperature
c. income
d. gender
Example
27
For relation: Scatter diagram
28
29
Presentation of Data(categorical)
Smoking Frequency Percentage
1. Frequency status
distribution table
Smoking 10 25
- Univariate
frequency table Not 30 75
(single variable) smoking
e.g. simple table Total 40 100
Or one way table
30
Presentation of data(categorical)
Cancer Total
Smoking Yes NO
Yes A(20) B(10) 30
No C(20) D (40) 60
Total N=
31
Use of 2x2 Contingency table
• Odds ratio
• Relative risk ratio
• Attributable risk
• Sensitivity
• Specificity
• Predictive values( positive and negative)
32
Descriptive Statistics
33
Selection of average
• Mean: Median
Ordinal scale
– interval and ratio scale Not affected by Skewed
– Normality nature data
– Most popular method Mode
Nominal scale
– Highly affected by outlier or
extreme observation Empirical formula
(2,4,6,8,100) or Mode=3median-2mean
(2, 20,21,23,25)
34
Selection of dispersion
Standard deviation: Quartile deviation:
• ratio scale and normal • ordinal scale and
data(2,4,6,8,10) skewed data, not
• Highly affected by affected by outliers
outliers • Range: highly
(1,10,11,12,13) affected by outliers
• Most popular method 2,4,6,8,10 range =8
2,4,6,8,100 range=98
35
Skewness
2 4 6 8 20 2 4 6 8 10 2 10 12 14 16
36
x f
10 3
15 8
20 16
25 10
30 3
Kurtosis
• Measure the flatness or peakedness with
reference to the normal curve
• It is used to describe the extreme values in
one versus the other tail.
• It is actually the measure of outliers present
in the distribution.
Leptokurtic >3, Positive
Mesokurtic = 3, normal
Platykurtic < 3, negative
38
MCQ
1. The symmetric nature of the distribution is measured
by:
a. Central tendency c. Dispersion
b. Skewness d. Kurtosis
2. A distribution is said to be negatively skewed, if
b. Mean < median < mode c. Mean>median> mode
c. Mean = median = mode d. None of the above
3. The flatness or peakedness of the distribution is
measured by:
d. Mean c. standard deviation
b. Skewness d. Kurtosis
Univariate Statistics
• Include proportion, percentages, ratios,
frequency distributions, and graphical
presentations
• Mean median, mode , range , standard
deviation
Bivariate and Multivariate Statistics
• Used to describe the associations between
variables. Pearson’s correlation coefficients,
Relative Risk, Odds Ratio and others
40
Normal Distribution
2 4 6 8 10 2 4 6 8 10
42
Test of normality
• Before selection of parametric test like t test,
ANOVA test, correlation, regression we must
check it whether it is normally distributed or
not.
Methods:
• Histogram
• Box plot
• Kolmogorov Smirnov test and Shapiro Wilk
test
43
Test of normality
44
45
n k
h a
T u