You are on page 1of 46

Statistical Analysis

Prem Prasad Panta


Asso. Prof. in Biostatistics
Karnali Academy of Health Sciences
Email: pantaprem7@gmail.com

1
Basic Concept of Statistics

2
Types of statistics
• Descriptive statistics
– describes the data in some manner
– Summarization of data
E.g. Frequency distribution and Graphical
presentation
– Measure of central tendency and dispersion
Inferential statistics
(sampling statistics)

• inference of population are drawn from


sample. It can be achieved by probability.
Prediction
Correlation and
Estimation regression
Testing of hypothesis
Variables

A concept that can measured is known as variable.


Or
A phenomenon which can take different values

Example:
Age, sex, occupation, weight, height, blood
pressure, no. of admitted patients in ICU
Types of Variables

Variables

Qualitative Quantitative
or categorical or numerical

Nominal Ordinal Discrete

Binary Continuous

Multinomial
6
Categorical or Qualitative variables
Can’t be measured in number but divide into
categories
Gender : Male/Female
Test : Positive/Negative
Response :Yes/ No

Types of Categorical variables


Nominal variable: no natural order or rank
No numerical meaning
Example: sex: male/female
Ordinal variable: exists order
Example: Pain: mild/moderate/severe
Binary variable: having only two mutually exclusive
categories
Example:
Response: Yes/No
Test : Positive/negative
Multinomial variable: having more than two
categories
Example
Religion:Hindu/Buddhist/Muslims/Chirstian

Statistical analysis
Descriptive statistics: frequency table, cross
tabulation diagrams and graphs
Inferential statistics: Non parametric test e.g. Chi
square test
Quantitative variable
Measured in numbers or values or quantity
Example: no. of students in the class, ht., wt.
Discrete variable: whole number
Example: House hold size (1, 2,3,4,5,6 .…)
Continuous variable: measured in fractional
number
Marks, WBC, Ht., Wt., BP, cholesterol level etc.
Variables

Statistical analysis
Descriptive statistics:
Frequency tables, graphs, mean, median,
mode , range, standard deviations etc.
Inferential Statistics:
Parametric test: z test, t test, ANOVA test
etc.; correlation and regression analysis

10
Scaling of Data
Categorical Data
• Nominal Scale
• Ordinal Scale

Quantitative Data
• Interval Scale
• Ratio Scale

11
Possible Measures
Central tendency: Mode
Test of significance: Chi square test
12
Possible Measures
Central tendency: Mode and median
Variability: Quartile deviation
Correlation: Spearman rank correlation
13
Correlation and regression
T test, z test and ANOVA test 14
All statistical analysis
Parametric test

15
Example
SN Marks Rank Result
1 20 Tenth Fail
2 60 Fifth Pass
3 40 Eighth Fail
4 50 Seventh Pass
5 55 Sixth Pass
6 80 Second Pass
7 70 Fourth Pass
8 75 Third Pass
9 35 Ninth Fail
10 85 First Pass
Nominal Ratio scale Ordinal scale Nominal scale
scale
Summary level
Nomina Ordina Interva Rat
What to do?
l l l io
The sequence of variables is established – Yes Yes Yes
Mode Yes Yes Yes Yes
Median – Yes Yes Yes
Mean – – Yes Yes
Difference between variables can be
– – Yes Yes
evaluated
Addition and Subtraction of variables – – Yes Yes
Multiplication and Division of variables – – – Yes
Absolute zero – – – Yes
18
How to know variables from Performa?
• What is your occupation? – Nominal variable
(Agriculture/Business/Teacher)
• What is your blood group?-Nominal variable
(A/B/AB/O)
• Economic status of the respondent ? - Ordinal
(Lower/Middle/Higher)
• Likert scale –Ordinal Scale
(Strongly agree, agree, neutral, disagree and strongly disagree

19
Likert Scale

20
21
Ratio Scale
• Age at first pregnancy(years) : …………yrs
• Weight of child at birth : ……….Kg.
• Income: NRS………/$............
• Systolic blood pressure: ………….

22
Exercise
1. If the grading of diabetes is classified as mild, moderate and
severe the scale of measurement used is :
a. Interval
b. Nominal
c. Ordinal
d. Ratio
2. The nominal level of measurement is represented in which
variable below?
a. level of satisfaction
b. temperature
c. income
d. gender
Example

3. Calendar year is an example of what scale of


measurement ?
a. Nominal b. Ordinal c. Interval d. Ratio
4. Students score is an a ……….. Scale.
a. Nominal b. Ordinal c. Interval d. Ratio
5. Patient ID number is an example of ……..
a. Nominal b. Ordinal c. Interval c. Ratio
SN Variables Scale
1 Blood type Nominal
2 Blood pressure Ratio
3 Pain scale Ordinal
4 Education level Ordinal
5 Literacy status Nominal
6 Temperature in Celsius Interval
7 Eye colors Nominal
8 jersey numbers Nominal
9 Tagging the animals Nominal
10 Death in emergency Ratio
11 Types of disease Nominal
Presentation of Categorical Data
(Qualitative data)
• Bar diagrams
- Simple(one variable) ,
- Multiple and subdivided( Two variables)
• Pie charts
• Frequency Distribution table
- Simple table (one way table)
- Cross tabulation (two way table): chi
square test, odds ratio and relative risk,
sensitivity and specificity, PPV and NPV
26
Presentation of Numerical Data (Quantitative data)
• Histogram: most common graph, check normality
• Frequency curve
• Scatter diagram or dot plot: showing the
relationship between two quantitative variables
• Line chart: showing the trend line according to
time period.
Tables
• Frequency Distributions table
• Relative frequency

27
For relation: Scatter diagram

28
29
Presentation of Data(categorical)
Smoking Frequency Percentage
1. Frequency status
distribution table
Smoking 10 25
- Univariate
frequency table Not 30 75
(single variable) smoking
e.g. simple table Total 40 100
Or one way table

30
Presentation of data(categorical)

Bivariate frequency table


Or two way table : using two variables
E.g. cross tabulation
Analysis
Association: By Chi square test, odds ratio etc
Ad/bc =800/200=4

Cancer Total
Smoking Yes NO
Yes A(20) B(10) 30
No C(20) D (40) 60
Total N=
31
Use of 2x2 Contingency table
• Odds ratio
• Relative risk ratio
• Attributable risk
• Sensitivity
• Specificity
• Predictive values( positive and negative)

32
Descriptive Statistics

Measure of Central Tendency


Measure of Variation (Dispersion)
• Absolute measure
• Mean – Range = L- S
– Inter quartile range = (Q3-Q1)
• Median – Quartile deviation = (Q3-Q1)/2
• Mode – Mean deviation
– Standard deviation
 Partition values – Variance (square of SD)
• Quartiles • Relative measure
– Coefficient of range=(L-S)/(L+S)
• Deciles
– Coefficient of Quartile deviation=
• Percentiles (Q3-Q1)/(Q3+Q1)
– Coefficient of variation =SD/Mean

33
Selection of average

• Mean: Median
Ordinal scale
– interval and ratio scale Not affected by Skewed
– Normality nature data
– Most popular method Mode
Nominal scale
– Highly affected by outlier or
extreme observation Empirical formula
(2,4,6,8,100) or Mode=3median-2mean
(2, 20,21,23,25)

34
Selection of dispersion
Standard deviation: Quartile deviation:
• ratio scale and normal • ordinal scale and
data(2,4,6,8,10) skewed data, not
• Highly affected by affected by outliers
outliers • Range: highly
(1,10,11,12,13) affected by outliers
• Most popular method 2,4,6,8,10 range =8
2,4,6,8,100 range=98
35
Skewness

• It measures the lack of symmetry in data distribution.


• Looks same in left and right from the central line
Interpretation
• Sk= 0, symmetric(mean =median = mode)
• Sk > 0, positively skewed (Mean > median >mode)
• Sk < 0, negatively skewed ( mean < median < mode)

2 4 6 8 20 2 4 6 8 10 2 10 12 14 16
36
x f
10 3
15 8
20 16
25 10
30 3
Kurtosis
• Measure the flatness or peakedness with
reference to the normal curve
• It is used to describe the extreme values in
one versus the other tail. 
• It is actually the measure of outliers present
in the distribution.
Leptokurtic >3, Positive
Mesokurtic = 3, normal
Platykurtic < 3, negative

38
MCQ
1. The symmetric nature of the distribution is measured
by:
a. Central tendency c. Dispersion
b. Skewness d. Kurtosis
2. A distribution is said to be negatively skewed, if
b. Mean < median < mode c. Mean>median> mode
c. Mean = median = mode d. None of the above
3. The flatness or peakedness of the distribution is
measured by:
d. Mean c. standard deviation
b. Skewness d. Kurtosis
Univariate Statistics
• Include proportion, percentages, ratios,
frequency distributions, and graphical
presentations
• Mean median, mode , range , standard
deviation
Bivariate and Multivariate Statistics
• Used to describe the associations between
variables. Pearson’s correlation coefficients,
Relative Risk, Odds Ratio and others

40
Normal Distribution

P(Mean ± 1SD)= 68% (Approx.) of the total observation


P(Mean ± 2SD) = 95% ((Approx.) of the total observation
P(Mean ± 3SD) = 99% (Approx.) of the total observation
Standard normal curve: mean = zero and standard deviation = 1
P(Z ±1)= 68% (Approx.) of the total observation
P(Z ± 2) = 95% ((Approx.) of the total observation
P(Z ± 3) = 99% (Approx.) of the total observation 41
Skewed Distribution( not normal)
Negatively skewed Positively skewed

2 4 6 8 10 2 4 6 8 10

42
Test of normality
• Before selection of parametric test like t test,
ANOVA test, correlation, regression we must
check it whether it is normally distributed or
not.
Methods:
• Histogram
• Box plot
• Kolmogorov Smirnov test and Shapiro Wilk
test
43
Test of normality

Kolmogorov Smirnov test or Shapiro Wilk test


H0: distribution is normal
H1: distribution is not normal
Interpretation
P value > 0.05, the test is normal (accept H0)
P value ≤ 0.05, the test is not normal (Reject H0)

44
45
n k
h a
T u

You might also like