Professional Documents
Culture Documents
Submitted To,
SM Shamiul Hoque Chowdhury
Assistant professor and Head
Dept. Biostatistics
BUHS
Submitted By,
Md. Shah Alam Khan
Session-Fall 2020
Roll- 022204202001
Dept: MPH in RCH
1
Assignment-01
a) Scales of measurements
Refers to the type of scales on which the dependent Variable (DV) is
measured.
Certain statistics can be calculated on some scales but not others.
There are four different scales of measurements-
Nominal
Ordinal
Interval
Ratio
Nominal: Nominal means “pertaining to name” It’s another name for a
category or Name only
Example
• Gender: Male, Female, Other.
• Colour: Brown, Black, Red, Yellow.
Interval: It’s a values of equal intervals that means something. For example. a
thermometer might have intervals of ten degrees. That is
O does not imply absence
Interval between two numbers meaningful
Ratio between two numbers not meaningful
Example
• Celsius Temperature
• IQ(intelligence scale
• Time on a clock with hands
Ratio: Use for variable on a scale that have measureable intervals. That is
O imply absence
2
Interval between two numbers meaningful
Ratio between two numbers not meaningful
Example
Height, Weight etc.
Qualitative Variable:
o Outcome stated in narrative form
o Examples:- Sex: Male/Female
Quantitative Variable:
o Outcome stated in number or quantity
o Examples:- Height, Weight etc.
(b) What are the graphical methods of presenting qualitative & Quantitative
data?
3
Angle 1500 × 360 3660 × 360 1890 × 360 25.75 × 360 360
9625 9625 9625 4625
=56.1 =136.9 =70.7 =96.3
20%
27%
A
B
16% C
37%
D
Mean: In statistics, mean refers to the “Average” that is used to drive the central
tendency of the given data, we use mean where the data is less scattered.
∑
Formula: (For ungrouped data)x=
∑∫
Formula: (For grouped data)x=
Here x = Observation
n = Number of observation
n = Frequency
Advantages of mean
It is easy to understand and easy to calculate.
It is based upon all the observations.
Disadvantages of mean
It can not be determined by in inspection.
It can not be used if we are dealing with qualitative characteristics. which
can not be measured quantitatively like religion, gender etc.
Median: Median is the “Middle” value in the list of number to find the median, we
arrange the observation in ascending order. If there is odd number of observations,
the median is the middle value. If the observations are even numbers, the median is
the average of two middle values.
Median is used when the data is more scattered.
5
( )
Formula: Median =
Advantage of median:
It is rigidly defined
It is easy to understand and easy to calculate.
Disadvantage of Median:
In case of even number of observations median can not be determined
exactly.
it is not based on all the observations.
6
Standard deviation:
Standard deviation shows the variable in data-
If the data is close together the standard deviation will be small.
If the data is spread out, the standard deviation will be large.
Standard deviation is often denoted by the lower greek letter ‘Sigma; 𝜎
The standard deviation formula can be represent using sigma notation:
n
X X
2
i
Sample standard deviation, S i 1
n 1
(square root of Variance)
N
X
2
i
N
(square root of population Variance)
If an experiment has a total no of n(s) possible outcomes , all of which are mutually
exclusive and equally likely , such that n(A) of the outcomes are favorable to an
event A , then the probability of the event A is defined by-
n( A )
P(A) =
n(S )
7
favorable outcome
=
total no of outcomes
A B
Multiplication Rule
a. For independent events
P(A & B) = P (A). P (B)
or
P(B).P(A)
b. For dependent events
P(A & B) = P (A). P (B/A)
or
P(B).P(A/B).
8
P(A), P(B) are called Unconditional prob.
P(B/A) is Conditional prob.
P(A & B) , P(A or B) are joint prob.
Assignment-02
9
3. If the population from which samples are drawn follow normal distribution,
then, sampling distribution of sample means will also follow normal
distribution irrespective of sample size.
4. If the parent population is not normally distributed then the sampling
distributed approaches normality if sample size is large (n≥30). The larger the
sample size the closer is the distribution to normality. This property is known
as central limit theorem (CLT).
Normal distribution: Normal distribution that has the most data in the centre of
with decreasing a mounts evenly distributed to the left and the right.
Estimation: Estimation is the process of using sample data to draw inferences about
the population.
Estimation
a) Point estimation:
11
A point estimation a population parameter is a single value of a
statistic.
For example the sample mean ‘x’ is a point estimation of the
population mean 𝜇 similarly, the sample proportion p is a point
estimation of the population proportion P.
b) Interval estimation:
An Interval estimation by two numbers, between which a population
parameter is said to lie.
For example - a<x <b is an interval estimation of the population
mean is grater than a but less than b.
Correlation:
Correlation is the degree of association between two or more variables.
If two or more qualities vary so that movements in one tend to be
accompanied by movements in other then they are said to be correlated.
a) Positive correlation:
When two variables more in the same direction then the correlation between
these two variable is said to the positive correlation.
When the value of one variable increases, the value of other value also
increases at the same rate. For example-
Traing (RS) : 350 360 370 380
Performance (KG): 30 40 50 60
b) Negative correlation:
In this type of correlation the two variables move in the opposite direction.
When the value of one variable increase the value of the other variable of one
variable decreases. For example- The relationship between price and demand.
12
c) Perfectly positive correlation:
When there is a change in one variable X, and if there is equal proportion of
change in the other variable say in the same direction, then these two variables
are said to have a perfectly positive correlation.
e) Zero correlation:
When the two variables are independent and the change in one variable has no
effect in other variable, then the correlation between these two variable is
known as zero correlation.
Zero
Correlation
Figure: Types of simple correlation.
13
Chi squared test:
Chi- square test is a statistical test commonly used compare observed data, with data
we would expect to obtain according to a specific hypothesis.
Formula of chi- squared test:
( )
The general formula is 𝑥 = ∑
Where O = Observed frequency.
E = Expected Frequency.
Test of significances:
Test of significance is a formal procedure for comparing observed data a
claim (also called a hypothesis) whose truth we want to asses.
Test of significance is used to test a claim about an unknown population
parameter.
Various steps of test of significance:
In general any test of significance is conducted using the following steps-
1. Describing data (just quoting values from the given problem)
2. Stating the assumptions (about population and sample as applicable)
3. Stating null and alternative hypothesis
4. Level of significance
5. Choosing test statistic (eg. Z, t, f, 𝑥 etc.) and stating its distribution.
6. Computing value of test statistic (using appropriate formula)
7. From table given in appendices finding (i) tabled value of test statistic
corresponding to level of significance (ii) calculated value of test statistic
P- value
8. Decision: Rejecting or accepting (do not rejecting null hypothesis)
9. Conclusion (concluding in the language of the question in the problem)
Type -I error:
A type-I error also known as an error of the finish kind, occurs when the null
hypothesis (Ho) is true, but is rejected.
The rate of the type-I error is called the size of the test and denoted by the
Greek letter 𝛼 (alpha).
It usually equals the significance level of test.
Assignment–03
Solution:-
Data Presentation: In Histogram
16
Data Presentation
60
50
40
Frequency
30
20
10
0
10-20 20-30 30-40 40-50 50-60 60-70
Age Group
Solution:
Given Data:
Population means 𝜇 0=2500 gm
Mean birth weight 𝑥== 2550 gm
Sample standard deviation, s= 250 gm
Number of sample n =28
Hypothesis:
Ho: 𝜇= 2500gm
Ha: 𝜇=2550gm
(As n<30 and del is not known we shall use tt-test)
General Formula:
t calculated value =
( )
= [ ]
√
√
17
=
√
=
.
.
=
.
=
= 1.058
Question 03. In a random sample of 200 individuals 50 were smokers and 150
non-smokers. Of the smokers 25 and of the non-smokers 15 were found to
have lung cancer. Can we conclude that lung cancer is associated with
smoking?
Solution:
Hypothesis:
Ho=There is no association between smoking and cancer.
Here,
Ha= There is association between smoking and cancer. RT= Row Total
× CT= Column Total
Estimated Value E=
GT= Grand Total
E= Expected frequency
×
So, E1= =10 Observed Value O1= 25
×
E2= =40 Observed Value O2 = 25
18
×
E3= =30 Observed Value O3= 15
×
E4= =120 Observed Value O4= 135
( )
Calculated Value 𝑥 =∑
( ) ( ) ( ) ( )
𝑥 =∑ + + +
= 22.5 + 5.625 + 7.5+1.875
= 37.5
The Calculated value of 𝑥 is 37.5
The calculated value of 𝑥 for 5% level of significance is 3.84
∴ 𝑥 cal. > 𝑥 𝑡𝑎𝑏.
∴ Ho is rejected and Ha is accepted
So It can be concluded that there is association between smoking and lung
cancer.
Question 04. Table 1 data of 20 adults with their age as given below.
Sex F M M F F M M M M F M M F M M F M F M F
Age 24 21 25 26 29 26 28 22 31 39 32 38 30 32 35 46 44 43 47 49
Years
(i) Make univariate table for sex and age by separately. [Hint: for age use
the class interval: 20-29, 30-39, 40-49]
(ii) Make bivariate table for age and sex.
(iii) What is the mean, median and mode of age?
Solution:
19
(ii) Bivariate table for age and sex:-
Age Group Male Female Total
20-29 5 3 8
30-39 5 2 7
40-49 2 3 5
Total 12 8 20
Mean:
We know that, 𝑥= ∑
=(24+21+25+26+29+26+28+22+31+39+32+38+30+32+35+46+44+43+47+49
)/20
=33.35
therefore, Mean=33.35
Mode:
The value that is most common
26 and 32 is the most common value.
therefore, mode=26, 32
20