Bio-Stat Class 2 and 3

Biostatistics
Prem Prasad Panta

Asso. Prof. in Biostatistics
Karnali Academy of Health Sciences
Email: pantaprem7@gmail.com
1
Probability
• Proportion of happening outcome from the
total
• Relative frequency of the total
• Chance of happening outcome from the
experiment
• Denoted by p
• So probability = favorable outcome = m
total outcome n
2
Example
• In coin tossing, total outcomes = {head, tail}=2
Probability of head = P (H) = ½
Probability of not head = P (T) = q = 1-1/2 =1/2
• In dice tossing, total outcomes = 6 {1,2,3,4,5,6)
– Probability of 5 = P(5) = 1/6
– Probability of 1 = P(1) = 1/6
• Probability of not happening event =q
• So p+q = 1
• Probability of sure event = 1
• probability of impossible event = 0
• Therefore probability lies between 0 to 1
3
Terms used in probability
• Independent events: first and second coin

tossing
• Dependent events: drawing first card and again
drawing second card from a pack of cards
• Mutually exclusive events: coin tossing
• Equally likely events: occurring of head and tail
in coin toss
4
Screening test
• Sensitivity: probability of having true positive

cases among the total cases (disease)
• Specificity: probability of having true negative
cases among the total healthy cases
• Positive predictive value: probability of
having the disease among a positive
screening test result
• Negative predictive value: probability of not
having disease among a negative screening
test result
5
Screening test
6
Screening test
7
MCQ
1. A bag contains 3 red and 5 green marbles. A
marble is drawn at random. The probability of
drawing a blue marble is
a. 5/8 b. 3/8 c. 0/8 d. 1/8
2. The sum of the probability of an event and non
event is :
b. 2 b. 1 c. 0 d. 0.5
3. If three coins are tossed simultaneously, than the
total numbers of outcome will be:
a. 6 b. 3 c. 8 d. 1
• { H,T} =2
• {H,T} =2
• {H,T} =2
• 2X2X2 =8 outcomes
• {HHH, HHT, HTH, THH, ………TTT} =8
• P(all head) =P(HHH) = 1/8
• P(TTT) = 1/8
• P ( two head and one tail) = 3/8
• When 2 dice are tossed at once how many outcomes are there?
• 6 X6 =36
• First coin 1 2 3 4 5 6
• Second 1 (1 ,1) (1,2) (1,3)
• 2
• 3
• 4
• 5
• 6
Frequency distribution
• X F
• 20 3
• 30 5
• 40 10
• 50 5
Probability distribution
• Discrete probability distribution: having
discrete random variables: no. of death, no.
of births, household size
– Binomial probability distribution
– Poisson probability distribution
• Continuous probability distribution
– Normal probability distribution
– E.g. ht, wt. marks, income, expenditure etc
12
Binomial probability distribution
• Knowns a Bernoulli trails or Bernoulli process
• Deals
– the experiment having two mutually exclusive
outcomes, (binary events)
– independent trials,
– constant probability of success (p) and not
happening outcome is called failure (q) [p + q= 1]
– finite trials(n<20)
• Coin tossing(head/tail), birth(male/female),
test(positive/negative), result(pass/fail) etc.
13
Binomial probability
• Probability of happening outcome is called
success and denoted by p and not happening
outcome is q
• So p + q=1
• Parameter of binomial probability are n and p
• Mean = np
• Variance = npq
14
Poisson probability distribution
• Discrete probability distribution

• It deals the event which occurring within the
certain time interval
• It is also used for the rare disease
• Limiting case of binomial distribution
• When probability of success is very small and no.
of experiment is large, then Poisson probability
distribution is used.
• The parameter of Poisson distribution is mean
• Note: mean = variance =λ
15
• Variance = Sum (X- mean of X)2/n
• SD = square root (sum (X- mean of X )2/n)
• 2 4 6 8 10
• Distance from mean (6)
• -4 -2 0 2 4
• 16 4 0 4 16 =40
• N=5
• 40/5 = 8 = variance
• SD = square root of 8 = 4x2 = 2 root 2=2.8
• Square of SD = variance
• Square root of variance = SD
Normal probability distribution
• Continuous probability distribution
• Known as Gaussian distribution
• E.g. ht,wt, marks, SBP etc.
• bell shaped and symmetrical curve
• Mean= median =mode
• Area =1
• Parameters = mean and standard deviation
17
MCQ
• A variable that can assume any value between two given points is called
___________
a) Continuous random variable
b) Discrete random variable
c) Irregular random variable
d) Uncertain random variable
• Which of the following mentioned Probability distribution is continuous probability
distribution?
a) Gaussian probability Distribution
b) Poisson probability Distribution
c) Binomial probability Distribution
d) none of them
• A variable which can assume finite or countably infinite number of values is known
as:
a. Continuous b. Discrete c. Qualitative d. None of them
• Total area under the curve of a continuous probability density function· is always
equal to:
a. Zero b. One c. -1 d. None of them
MCQ
• Which of the following is not a characteristic of the
normal distribution?
a. the mean is always zero
b. the area under the curve equals one
c. the mean, median and mode are equal
d. it is a symmetrical distribution
• the parameters of normal distribution are:
a. Mean and median
b. Mean and standard deviation
c. Mean and mode
d. Mean, median and mode
Estimation
• Point estimation: single point
Example: estimation of mean and estimation of
prevalence
• Interval estimation: estimation of population
parameter within the certain range
• Confidence interval: interval estimation
having certain confidence i.e. certain level of
probability
• 90% CI (Z= 1.64), 95% CI (Z = 1.96) and 99%
CI (Z =2.58)
20
Confidence Interval (CI)
Mean ± 1.96 SE
Where SE for sample mean = SD/√n
Standard error: variability of sample means and
calculated by SD/√n
Standard deviation: variability of observations
21
Z value for α and β
Two tailed test One tailed test

Zα/2 = 1.64 at 90% CI Zα = 1.28 at 90% CI
Zα/2 = 1.96 at 95% CI Zα = 1.64 at 95% CI
Zα/2 = 2.58 at 99% CI Zα = 2.33 at 99 %CI
Z β = 0.84 at 80 % power of test

22
Confidence interval
23
Which one is reliable??
• The mean systolic blood pressure(120) lies
surely 95% in between 110 to 130
• The mean systolic blood pressure (120) lies

surely in between 100 to 140.
Note: Narrow the interval , better the precision
24
Hypothesis testing
• Null hypothesis(H0):
– Two means are equal ( not significantly different)
– Two proportions are equal
– There is no correlation between two variables
– There is no association between two variables
• Alternative hypothesis(H1)
– Two means are not equal (significantly different)
– Two proportions are significantly different
– There is significant correlation between two variables
– There is significant association between two variables
25
Types of alternative hypothesis
• Two tailed test (non directional) : not equal

• One tailed test (directional)
– Right tailed( first mean > second mean)
– Left tailed( first mean < second mean)
26
Two tailed test
Accept Ho if the sample mean falls in this region
0.025 0.475 + 0.475 0.025
0.95
Z=-1.96 Z=1.96
Reject Ho if the sample mean falls in either of these two region
27
One tailed test
Accept Ho if the sample mean falls in this

region
Rejection Region Acceptance Region
0.05 + 0.5=0.95
Z= -1.645 =H0
Reject Ho if the sample mean falls in either of these two

regions.
28
Errors in hypothesis
•
29
False positive and false negative
30
P value
• In technical terms, a P value is the probability of
obtaining an effect at least as extreme as the
one in our sample data, assuming the truth of
the null hypothesis.
• High P values: Our data are likely with a true null
hypothesis
• Low P values: Our data are unlikely with a true
null hypothesis
31
Interpretation of p value
• P value is compared with 5% and 1% level of

significance
• P < 0.05 ( α = 5%, level of significance) ,
significant ( mean of two groups are significantly
different or two variables are significantly
associated)
32
Quantitative Variable
H0: Distribution of sample is normal
Test of normality (Kolmogorov Smirnov test)
Fail to reject H0( accepted)

H0 rejected
Normal Distribution Non -normal Distribution
Parametric test Non- parametric test
Test for relationship Test for mean Test for relationship

Test for mean
(Pearson’s correlation) (Spearman Rank
correlation)
33
Selection of test
Samples Comparison Parametric Non parametric test

of two test(Follows (Does not follow
averages normality ) normality)
Independent Different T test Mann Whitney U
groups or ( Independent test
samples t test)
Dependent Same groups Paired t test Wilcoxcon Matched
(Same group) of samples (Dependent t Pairs Signed Rank
test) Test
34
ANOVA test
More than two group or samples
Samples Comparison of Parametric Non parametric test
more than two test (Does not follow
averages (Follows normality)
normality )
Independe Different ANOVA Kruskal Wallis One

nt groups or One /two way Way ANOVA by rank
samples ANOVA
Dependent Same groups of Repeated Friedmann test

(Same samples ANOVA (ANOVA)
group)
35
Measures of Association
Samples Variable 1 Variable 2 Statistics
Independent Nominal Nominal Chi square test

(two different
group) Fisher Exact test*
Ordinal Nominal
Yates Correction*
Nominal Ordinal
Dependent Nominal Nominal Mc Nemar Test*

(same group
or sample)
* Applied only for 2x2 table

36
Measures of Relationship
Statistics Variable 1 Variable 2 Range
Non parametric : ordinal ordinal

Spearman rank -1 to +1
correlation
Ordinal ratio
Ratio Ordinal
Parametric: Interval and Interval and -1 to +1
Pearson Coefficient of ratio ratio
correlation
37
Z test
• Randomness
• Known variance
• Sample size > 30
• Normality
Interpretation
• Calculated Z > tabulated Z value (z=1.96 at
5% level of significance), reject null
hypothesis, otherwise accept null
hypothesis
38
Types of Z test
• Comparison between
– Sample mean and population mean
– Two sample means
– Sample proportion and population proportion
– Two sample proportions
39
T test and types
• Randomness
• Normality
• Sample size less than 30
• Unknown variance
Degree of freedom
= n-1 ( for one sample mean test) and paired data
= (n1-1)+(n2-1) = n1+n2 -2 ( for two sample mean
test)
40
Chi square test
Assumption: Use
i. When row and column value in the contingency
table are categorical or qualitative data
ii. none of the cells have expected frequency zero.
iii. Expected cell frequency should be at least five
iv. Adequate sample size (n=50)
Types
i) Test of association between two categorical variables
ii) Test of goodness of fit
41
Contingency table ( 2x2 table)
column
Variable 1 Variable 2 Total
Yes No
Row Yes a b a+b
No c d c+d
Total a+c b+d N= a+b+c+d
Degree of freedom = (row- 1) x (column -1) =(2-1)(2-1) =1

For 2x2 table, DF =1
For 2x3 table, DF = 2
For 3x2 table, DF =2
for 3x3 table, DF = 4
42
MCQ
1. A statement made about a population for testing purpose is
called?
a) Statistic b) Hypothesis c) Level of Significance d) Test-Statistic
2. If the null hypothesis is false then which of the following is
accepted?
a) Null Hypothesis b) Positive Hypothesis
c) Negative Hypothesis d) Alternative Hypothesis
3. The point where the Null Hypothesis gets rejected is called as?
a) Significant Valueb) Rejection Value
c) Acceptance Value d) Critical Value
4. The alternative hypothesis is also called:
a) Statistical hypothesis b) research hypothesis
c) Simple hypothesis d) null hypothesis
43
MCQ
5. Probability of rejecting the null hypothesis, when it is true is called

a. Power of test c. Level of significance (type I error)
b. level of confidence d. Type II error
6. If healthy client is admitted to the hospital, the error is said to be
c. Type I error c. Type II error
d. sampling error d. Unbiased error
7. Type II error is committed when
e. We reject the null hypothesis when it is not is true
f. We reject a null hypothesis when it is true
g. We accept a null hypothesis when it is not true
h. We accept a null hypothesis when it is true
8. Testing Ho: µ = 25 against H1: µ ≠ 25 leads to:
i. Two-tailed test c. left-tailed test
j. Right-tailed test d. Neither (a), (b) and (c)
Correlation and Regression
• Relationship between two quantitative variables (X=
independent variable and Y= dependent variable)
• Lies between -1 to +1
• Near to +/- 1, strong correlation
• Near to 0 , poor correlation
• R= 0, no correlation exist
• Types of correlation
• Positive correlation and negative correlation:
– Positive correlation :e.g. age and blood pressure, income and
expenditure etc.
– Negative correlation: age and immunity, income and fertility etc
Cont.
• Simple, multiple and partial correlation
– Simple: relationship between only two variables:
age and height
– Multiple: relationship between more than two
variables: age, height and weight
– Partial: controlling one and finding the relationship
between others: age and weight controlling height
• Linear and non linear
– Linear: straight line relationship
– Non linear: exponential, hyperbolic relationship
Cont.
• Parametric method: continuous data
– Karl Pearson coefficient of correlation
• Non parametric method: ordinal data
– Spearman rank correlation
• Coefficient of determination (r2)
– Measures the amount of change in y variable due
to the change of X variable
– Lies between 0 to 1
Regression
••Cause
and effect relationship between two variables
•Cause/predictor/explanatory/ independent variable/factors
•Effect/outcome/response/dependent variable
•The amount of change in y variable due to the per unit
change of x variable is known as regression coefficient
(beta=
• Y = a+bx
– Y = dependent variable
– X =independent variable
– b= regression coefficient or slope of the line
– a= y intercept
Regressing line
Positive regression
Negative regression
Regression
• If correlation value is negative, regression
coefficient is also negative
• If correlation value is positive, regression
coefficient is also positive
• Regression is used to Predict the dependent
variable on the basis of given information of
independent variable
MCQ
1. A process by which we estimate the value of dependent variable
on the basis of one or more independent variables is called:
(a) Correlation (b) Regression (c) Residual (d) Slope
2. All data points falling along a straight line is called:
(a) Linear relationship (b) Non linear relationship
(c) Residual (d) Scatter diagram
4. The slope of the regression line of Y on X is also called the:
(a) Correlation coefficient of X on Y
(b) Correlation coefficient of Y on X
(c) Regression coefficient of X on Y
(d) Regression coefficient of Y on X
5. In simple linear regression, the numbers of unknown constants
are:
(a) One (b) Two (c) Three (d) Four
MCQ
6. If the value of any regression coefficient is zero, then
two variables are:
(a) Qualitative (b) Correlation
(c) Dependent (d) Independent
7. The straight line graph of the linear equation
Y = a+ bX, slope will be upward if:
(a) b = 0 (b) b < 0 (c) b > 0 (b) b ≠ 0
8. The independent variable is also called:
(a) predictor (b) response
(c) outcome (d) Estimated
9. A measure of the strength of the linear
relationship that exists between two variables is
called:
(a) Slope
(b) Intercept
(c) Correlation coefficient
(d) Regression equation
10. If one item is fixed and unchangeable and the
other item varies, the correlation coefficient will be:
(a) Positive (b) Negative (c) Zero (d) Undecided
Sampling technique
• Sampling –statistic
• Population – parameter
• Population and census
• Sample and sampling
• Sampling error: difference between sample
value and population parameter
• Non sampling error: human error
Types of sampling
• Probability sampling
– Simple random sampling
– Systematic sampling
– Stratified random sampling
– Cluster sampling
– Multistage sampling
• Non probability sampling
– Convenience sampling
– Purposive sampling
– Quota sampling
– Snowball sampling
Thank you

Bio-Stat Class 2 and 3

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bio-Stat Class 2 and 3

Uploaded by

Copyright:

Available Formats

Biostatistics

Prem Prasad Panta

• Independent events: first and second coin

• Sensitivity: probability of having true positive

• Discrete probability distribution

Two tailed test One tailed test

Z β = 0.84 at 80 % power of test

• The mean systolic blood pressure (120) lies

• Two tailed test (non directional) : not equal

Accept Ho if the sample mean falls in this region

0.025 0.475 + 0.475 0.025

Accept Ho if the sample mean falls in this

Rejection Region Acceptance Region

Reject Ho if the sample mean falls in either of these two

• P value is compared with 5% and 1% level of

Test of normality (Kolmogorov Smirnov test)

Fail to reject H0( accepted)

Parametric test Non- parametric test

Test for relationship Test for mean Test for relationship

Samples Comparison Parametric Non parametric test

Independe Different ANOVA Kruskal Wallis One

Dependent Same groups of Repeated Friedmann test

Independent Nominal Nominal Chi square test

Dependent Nominal Nominal Mc Nemar Test*

* Applied only for 2x2 table

Non parametric : ordinal ordinal

Degree of freedom = (row- 1) x (column -1) =(2-1)(2-1) =1

5. Probability of rejecting the null hypothesis, when it is true is called

You might also like