You are on page 1of 20

Course Title: Basics of Biostatistics

Course Code: 0301201

Submitted To,
SM Shamiul Hoque Chowdhury
Assistant professor and Head
Dept. Biostatistics
BUHS
Submitted By,
Md. Shah Alam Khan
Session-Fall 2020
Roll- 022204202001
Dept: MPH in RCH

Date of Submission: 26.11.2020

1
Assignment-01

Question 01. (a) Discuss various scales of measurements with examples.


(b) Define variable and discuss types of variable.

Answer to the question no: 01

a) Scales of measurements
 Refers to the type of scales on which the dependent Variable (DV) is
measured.
 Certain statistics can be calculated on some scales but not others.
 There are four different scales of measurements-
 Nominal
 Ordinal
 Interval
 Ratio
 Nominal: Nominal means “pertaining to name” It’s another name for a
category or Name only
Example
• Gender: Male, Female, Other.
• Colour: Brown, Black, Red, Yellow.

 Ordinal: Ordinal Means in order includes likes “First,Second,Third to


continue position or Imply order
Example
• High School class ranking:1st,5th,16th,and 99th
• Socioeconomic status :poor, middle class, rich.

 Interval: It’s a values of equal intervals that means something. For example. a
thermometer might have intervals of ten degrees. That is
 O does not imply absence
 Interval between two numbers meaningful
 Ratio between two numbers not meaningful
Example
• Celsius Temperature
• IQ(intelligence scale
• Time on a clock with hands

 Ratio: Use for variable on a scale that have measureable intervals. That is
 O imply absence
2
 Interval between two numbers meaningful
 Ratio between two numbers not meaningful
Example
Height, Weight etc.

(b) Define variable and discuss types of variable.


Answer:
Definition of variable: Variable may be defined as follows-
Variable = Vary + able
Any characteristic or measurement that is able to vary.
Types of variable: There are two types of variable such as
 Qualitative Variable:
 Quantitative variable

 Qualitative Variable:
o Outcome stated in narrative form
o Examples:- Sex: Male/Female
 Quantitative Variable:
o Outcome stated in number or quantity
o Examples:- Height, Weight etc.

Question 02. (a) Discuss pie diagram.

(b) What are the graphical methods of presenting qualitative & Quantitative
data?

Answer to the question no: 02

(a) pie diagram.

 Pie diagram is commonly used for presenting qualitative data.


 Shows the breakdown of a group or total in a radius form.
For example:
Distribution of patients visiting in a week in different diabetic hospitals of Dhaka
city has been pie diagrammed:
Hospital A B C D Total
No of 1500 3660 1890 2575 9625
Patients
% of 16 37 20 27 100
Patients

3
Angle 1500 × 360 3660 × 360 1890 × 360 25.75 × 360 360
9625 9625 9625 4625
=56.1 =136.9 =70.7 =96.3

20%
27%

A
B
16% C
37%
D

Fig: Pie Diagram

(b) the graphical methods of presenting qualitative & Quantitative data

Graphical presentation has some


ome advantages over array and tabular
tabular presentation as
graphical Presentations are-

 Simple appealing and attractive


 Enables
nables visual analysis through comparison among concerned groups.
Data Type Univariate Bivariate
Qualitative 1) Simple bar diagram Multiple Bar
2) Component bar Diagram
3) Pic Diagram
4) Line graph
Quantitative 1) Histogram Scattered Diagram
2) Frequency polygon
3) Frequency Curve
4) Ogive
give (cumulative frequency curve)
5) Stem and leaf
6) Box and whisker

Table: Graphical Presentation by type of table and data.

Q.3. Discuss any two measures of central tendency.


4
Answer to the question no: 03

Measure of central tendency:


In statistics measure of central tendency is a central value for a probability
distribution of the given data.
 There are 02 type of data
1) Ungrouped data
2) Grouped data
 Measure of central tendency includes 03 measurements
 Mean
 Median
 Mode
 Two measure of central tendency mean and median are discussed below:

Mean: In statistics, mean refers to the “Average” that is used to drive the central
tendency of the given data, we use mean where the data is less scattered.

 Formula: (For ungrouped data)x=
∑∫
 Formula: (For grouped data)x=
Here x = Observation
n = Number of observation
n = Frequency

Advantages of mean
 It is easy to understand and easy to calculate.
 It is based upon all the observations.

Disadvantages of mean
It can not be determined by in inspection.
 It can not be used if we are dealing with qualitative characteristics. which
can not be measured quantitatively like religion, gender etc.

Median: Median is the “Middle” value in the list of number to find the median, we
arrange the observation in ascending order. If there is odd number of observations,
the median is the middle value. If the observations are even numbers, the median is
the average of two middle values.
Median is used when the data is more scattered.

Median ungrouped data ( Odd number of observation)


 Formula: Median =

Median ungrouped data ( Even number of observation)

5
( )
 Formula: Median =

Median (Grouped data)


ℎ ∑
 Formula: Median =l+ [ − 𝑐. 𝑓

Where I = Lower class boundary


h = Size of class boundary
f= Median frequency
c.f = Cumulative frequency of presenting value.

Advantage of median:
It is rigidly defined
 It is easy to understand and easy to calculate.

Disadvantage of Median:
 In case of even number of observations median can not be determined
exactly.
it is not based on all the observations.

Question 04. What are the various measures of dispersion? Discuss


standard deviation and co-efficient of variations (CV).

Answer to the question no: 04


Various measures of dispersion: There are mainly 02 types of measuring of
dispersion-
a) The absolute measures of dispersion.
b) The relative measures of dispersion.

The important absolute measure of dispersion is as follows:


1) Range
2) Mean or average deviation
3) Variance
4) Standard deviation
5) Quartile deviation
The relative measures of dispersion are as follows:
1) Coefficient of range.
2) Coefficient of mean deviation
3) Coefficient of variation
4) Coefficient of quartile deviation.

6
Standard deviation:
Standard deviation shows the variable in data-
 If the data is close together the standard deviation will be small.
 If the data is spread out, the standard deviation will be large.
 Standard deviation is often denoted by the lower greek letter ‘Sigma; 𝜎
 The standard deviation formula can be represent using sigma notation:
n

 X X
2
i
 Sample standard deviation, S i 1

n 1
(square root of Variance)
N

 X 
2
i

 Population standard deviation,   i 1

N
(square root of population Variance)

 The standard deviation is the square root of Variance

Coefficient of variations (CV):- The Coefficient of variation is the ratio of the


standard deviation to the anithentic mean, expressed as a percentage
CV= × 100
Since both ‘s’ and ‘x’ bar has same unit, so Coefficient of variation has no unit that is
why comparison is direct the set that has higher value of CV is verifying that more
heterogeneity than the other.
 Measures relative variation
 Always in percentage (%)
 Shows variation relative to mean.
 It used to compare two or more sets of data measured in different units.

Question 05. What is probability? State addition and multiplication


laws of probability.
Answer to the question no: 05
Definition of probability:

If an experiment has a total no of n(s) possible outcomes , all of which are mutually
exclusive and equally likely , such that n(A) of the outcomes are favorable to an
event A , then the probability of the event A is defined by-
n( A )
P(A) =
n(S )
7
favorable outcome
=
total no of outcomes

Addition and multiplication laws of probability


Addition Rule:
a. For mutually exclusive events
Let us consider, A and B are two mutually exclusive events, than the probability that
either A or B will occur is,
P(A or B) = P (A) + P (B)

A B

b. For mutually non-exclusive


exclusive events
Let us consider, A and B are two mutually non-exclusive
non exclusive events, than the
probability that either A or B will occur is,
P(A or B) = P (A) + P (B) – P (A and B)

Multiplication Rule
a. For independent events
P(A & B) = P (A). P (B)
or
P(B).P(A)
b. For dependent events
P(A & B) = P (A). P (B/A)
or
P(B).P(A/B).

8
P(A), P(B) are called Unconditional prob.
P(B/A) is Conditional prob.
P(A & B) , P(A or B) are joint prob.

Assignment-02

Question06. What is sampling distribution? Describe the


characteristics of sampling distribution.
Answer to the question no: 06

Definition of Sampling distribution: A sampling distribution of all of the possible


values of a sample statistic for a given size sample selected from a population.
Example: Sample mean, Sample proportion etc.

The characteristics of sampling distribution:


1. The mean of all possible sample means
= Population mean i.e. 𝜇 ̅ = 𝜇
2. The SD of all possible sample mean 𝜎𝑥̅
=𝜎 or 𝜎 × [(𝑁 − 𝑛)/(𝑁 − 1)] , theoretically
√𝑛 √𝑛
=𝑠 or 𝑠 × [(𝑁 − 𝑛)/(𝑁 − 1)], operationally
√𝑛 √𝑛

Here 𝜎 is know as SE (x) [SE= Standard Error]


√𝑛
And 𝜎 is estimated by 𝑠
√𝑛 √𝑛
𝑥̅ 𝜎
𝑥̅ √𝑛
𝑥̅
𝑥̅
𝑥̅ 𝑥̅ 𝑥̅
𝒙
𝝁
Figure: Sampling Distribution

9
3. If the population from which samples are drawn follow normal distribution,
then, sampling distribution of sample means will also follow normal
distribution irrespective of sample size.
4. If the parent population is not normally distributed then the sampling
distributed approaches normality if sample size is large (n≥30). The larger the
sample size the closer is the distribution to normality. This property is known
as central limit theorem (CLT).

Question07. Define normal distribution. State important properties


and uses of normal distribution.
Answer to the question no: 07

Normal distribution: Normal distribution that has the most data in the centre of
with decreasing a mounts evenly distributed to the left and the right.

𝒛𝒆𝒓𝒐 𝒔𝒌𝒆𝒘𝒏𝒆𝒖𝒔 𝒎𝒆𝒂𝒏 = 𝒎𝒆𝒅𝒊𝒂𝒏 = 𝒎𝒐𝒅𝒆

Examples of normal distribution:


 The body temperature for healthy humans.
 The eights and weight of adults.
 The thickness and dimensions of a productetc.

Important properties normal distributions:


1. Symmetrical- Bell shaped that means from the peak the curve tails of equally
to both sides.
2. Mean, Median and Mode coincides.
3. The range: i) Mean-1 SD to mean +1 SD include 68% values. (𝑥̅ ± 1𝑆)
ii) Mean -1.96 SD to mean +1.96 SD include 95% values.
10
(𝑥̅ ± 1.96 𝑆)
iii) Mean – 2.58 SD to mean +2.58 SD include 99% values
(𝑥̅ ± 2.58 𝑆)
[Here SD or S = Standard Deviation]

Uses of normal distribution:


1. Used to illustrate the shape and variability of the data.
2. Used to estimate future process performance.
3. Normality is an important assumption when conducting statistical
analysis.
4. Certain spc charts and many statistical inference tests require the data to
be normally distributed.

Question 08. .(a) Discuss the methods of estimation.


(b) What is correlation? Discuss the different types of simple
correlation.

Answer to the question no: 08

Estimation: Estimation is the process of using sample data to draw inferences about
the population.

Sample inferences Population


Information ---------------- Parameters
𝐱, 𝜎 𝜇, 𝜎

Methods of estimation: There are Z methods of estimation


a) Point estimation
b) Interval estimation

Estimation

Point Estimation Internal estimation


 Sample mean  Confidence interval for mean
 Sample proportion  Confidence interval proportion

a) Point estimation:
11
 A point estimation a population parameter is a single value of a
statistic.
For example the sample mean ‘x’ is a point estimation of the
population mean 𝜇 similarly, the sample proportion p is a point
estimation of the population proportion P.
b) Interval estimation:
An Interval estimation by two numbers, between which a population
parameter is said to lie.
For example - a<x <b is an interval estimation of the population
mean is grater than a but less than b.

Correlation:
 Correlation is the degree of association between two or more variables.
 If two or more qualities vary so that movements in one tend to be
accompanied by movements in other then they are said to be correlated.

Types of simple correlation:


On the bases of direction of change, there are 5 types of simple correlation-
a) Positive correlation
b) Negative Correlation
c) Perfectly positive correlation
d) Perfectly Negative correlation
e) Zero Correlation

a) Positive correlation:
 When two variables more in the same direction then the correlation between
these two variable is said to the positive correlation.
 When the value of one variable increases, the value of other value also
increases at the same rate. For example-
Traing (RS) : 350 360 370 380
Performance (KG): 30 40 50 60

b) Negative correlation:
 In this type of correlation the two variables move in the opposite direction.
 When the value of one variable increase the value of the other variable of one
variable decreases. For example- The relationship between price and demand.

12
c) Perfectly positive correlation:
 When there is a change in one variable X, and if there is equal proportion of
change in the other variable say in the same direction, then these two variables
are said to have a perfectly positive correlation.

d) Perfectly Negative Correlation:


Between two variable X and Y, if the change in X causes the same amount of
change in Y in equal proportion but in opposite direction, then these
correlation is called as perfectly negative correlation.

e) Zero correlation:
When the two variables are independent and the change in one variable has no
effect in other variable, then the correlation between these two variable is
known as zero correlation.

Types of simple correlation

On the basis of direction of change

Positive Negative Perfectly Perfectly


correlation Correlation Positive Negative
Correlation correlation

Zero
Correlation
Figure: Types of simple correlation.

Question 09. Discuss Chi-squared test.


Answer to the question no: 09

13
Chi squared test:
Chi- square test is a statistical test commonly used compare observed data, with data
we would expect to obtain according to a specific hypothesis.
Formula of chi- squared test:
( )
The general formula is 𝑥 = ∑
Where O = Observed frequency.
E = Expected Frequency.

Procedure of chi- squared test:


1. Preparation of contingency table.

2. Formulation of null hypothesis (H0)


and alternative hypothesis (Ha)
H0: The two variable are not associated
Ha: The two variable are associated
3. Calculation of expected frequency (E)
a. Ev= CXR/T
b. Where C = Column total
R= Row total,
T= Grand total
Then finding the difference between each observed and its expected
frequency i.e. O-E
4. Designation of level of significance- the usual levels of significance are 5% or
1%
5. Calculation of 𝑥
6. Calculation of degree of freedom (df) for the contigency table by using the
formula (r-1) (c-1)
Where r = number of rows, c= number of column
7. Finding the tabulated value of 𝑥 at specified level of significance.
8. Comparing the calculated value of 𝑥 with the tabulated value. If the calculated
value is greater than the tabulated value, the 𝑥 is significant and the null
hypothesis (H0) is rejected which mean there is a probable association between
the variables.
9. Conclusion.

Restriction of chi-square test:


 𝑥 test will be valid if 80% expected frequency > 5 and
14
 𝑛𝑜 𝑒𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑏𝑒𝑙𝑜𝑤𝑠
 In a 2x2 table if expected frequency of any cell <5 yate’s

Correction: 𝑥 = ∑[ 𝑂 − 𝐸1 − 1 2 /𝐸 Should be applied.

Question 10. What do you understand by test of significance? State


various steps of test of significance. Describe type-I and type-II error.
Answer to the question no: 10

Test of significances:
 Test of significance is a formal procedure for comparing observed data a
claim (also called a hypothesis) whose truth we want to asses.
 Test of significance is used to test a claim about an unknown population
parameter.
Various steps of test of significance:
In general any test of significance is conducted using the following steps-
1. Describing data (just quoting values from the given problem)
2. Stating the assumptions (about population and sample as applicable)
3. Stating null and alternative hypothesis
4. Level of significance
5. Choosing test statistic (eg. Z, t, f, 𝑥 etc.) and stating its distribution.
6. Computing value of test statistic (using appropriate formula)
7. From table given in appendices finding (i) tabled value of test statistic
corresponding to level of significance (ii) calculated value of test statistic
P- value
8. Decision: Rejecting or accepting (do not rejecting null hypothesis)
9. Conclusion (concluding in the language of the question in the problem)

Type -I error:
 A type-I error also known as an error of the finish kind, occurs when the null
hypothesis (Ho) is true, but is rejected.
 The rate of the type-I error is called the size of the test and denoted by the
Greek letter 𝛼 (alpha).
 It usually equals the significance level of test.

Type -II error:


1. Type -II error also known as an error of the second kind occurs when the
null hypothesis is false, but error cause fails to be rejected.
15
2. Type -II error mean accepting the hypothesis which should have been
rejected.
3. A Type -II error may be compared with fail to believe a truth.
4. A Type -II error occurs when one rejects the alternative hypothesis (fails to
reject the null hypothesis when the alternative hypothesis is true).
5. The rate of the type-II error is denoted by the Greek letter 𝛽 (beta) and
related to the power of a test (which equals 1- 𝛽)

Assignment–03

Question 01.Table below shows the distribution of individuals according to their


age.
Present the data in an appropriate graph:
Age Group (Years) Frequency
10-20 4
20-30 20
30-40 40
40-50 60
50-60 20
60-70 06
Total 150

Solution:-
Data Presentation: In Histogram

16
Data Presentation
60
50
40
Frequency

30
20
10
0
10-20 20-30 30-40 40-50 50-60 60-70

Age Group

Question02.The mean birth-weight


weight of babies born in a community over
several years was 2500 gm. Following implementation of an antenatal care
program, the mean birth weight obtained from a sample of 28 babies was 2550
gm and SD was 250 gm. Does the antenatal program have
have any impact on the
birth weight of new born babies (considering the other influencing factors
remaining the same)?

Solution:
Given Data:
Population means 𝜇 0=2500 gm
Mean birth weight 𝑥== 2550 gm
Sample standard deviation, s= 250 gm
Number of sample n =28
Hypothesis:
Ho: 𝜇= 2500gm
Ha: 𝜇=2550gm
(As n<30 and del is not known we shall use tt-test)
General Formula:

t calculated value =
( )

= [ ]


17
=

=
.
.
=
.
=
= 1.058

5% level of significance Tabulated value of t= 2.05


Calculated Value of = 1.058
So Calculated value t= 1.058 is less than tabulated value 2.05 i.e. t cal. < t tab.
Therefore, Ho is accepted and Ha rejected.
So the data does not give evidence that the neonatal program has significant on
the birth weight of new born babies.

Question 03. In a random sample of 200 individuals 50 were smokers and 150
non-smokers. Of the smokers 25 and of the non-smokers 15 were found to
have lung cancer. Can we conclude that lung cancer is associated with
smoking?

Solution:

Lung Cancer(+ve) Lung Cancer(-ve) Total


Smokers 25(E1) 25 (E2) 50
Non Smokers 15(E3) 135(E4) 150
Total 40 160 200

Hypothesis:
Ho=There is no association between smoking and cancer.
Here,
Ha= There is association between smoking and cancer. RT= Row Total
× CT= Column Total
Estimated Value E=
GT= Grand Total
E= Expected frequency
×
So, E1= =10 Observed Value O1= 25
×
E2= =40 Observed Value O2 = 25

18
×
E3= =30 Observed Value O3= 15
×
E4= =120 Observed Value O4= 135
( )
Calculated Value 𝑥 =∑
( ) ( ) ( ) ( )
𝑥 =∑ + + +
= 22.5 + 5.625 + 7.5+1.875
= 37.5
The Calculated value of 𝑥 is 37.5
The calculated value of 𝑥 for 5% level of significance is 3.84
∴ 𝑥 cal. > 𝑥 𝑡𝑎𝑏.
∴ Ho is rejected and Ha is accepted
So It can be concluded that there is association between smoking and lung
cancer.

Question 04. Table 1 data of 20 adults with their age as given below.
Sex F M M F F M M M M F M M F M M F M F M F
Age 24 21 25 26 29 26 28 22 31 39 32 38 30 32 35 46 44 43 47 49
Years

(i) Make univariate table for sex and age by separately. [Hint: for age use
the class interval: 20-29, 30-39, 40-49]
(ii) Make bivariate table for age and sex.
(iii) What is the mean, median and mode of age?
Solution:

(i) Univariate table for sex:-


Sex Number %
Male 12 60
Female 8 40
Total 20 100

Univariate Table for age:-


Age Group Number %
20-29 8 40
30-39 7 35
40-49 5 25
Total 20 100

19
(ii) Bivariate table for age and sex:-
Age Group Male Female Total
20-29 5 3 8
30-39 5 2 7
40-49 2 3 5
Total 12 8 20

(iii) the mean, median and mode of age.

Mean:
We know that, 𝑥= ∑
=(24+21+25+26+29+26+28+22+31+39+32+38+30+32+35+46+44+43+47+49
)/20
=33.35
therefore, Mean=33.35

Median for the data:


Arranged in ascending order : 21 22 24 25 26 26 28 29 30 31 32 32 35 38 39
43 44 46 47 49
1st middle item 20/2 = 10th item
2nd middle item 10+1 = 11 th item
The value of 10th and 11th items are 31 and 32
therefore, median = av. Of 31 and 32 = (31+32)/2 = 31.5

Mode:
The value that is most common
26 and 32 is the most common value.
therefore, mode=26, 32

20

You might also like