Professional Documents
Culture Documents
BIOSTATISTICS
IMPORTANT QUESTIONS
CHAPTER: 1 INTRODUCTION
MEANING OF STATISTICS in plural sense:
It refers to numerical facts in any field of study. These facts are collected in a systematic manner with a
definite purpose in view. We also use the word data to refer to statistics in this sense.
It refers to the science comprising methods which are used in the collection, presentation, analysis and
interpretation of numerical data.
It is plural of statistic. By statistic, we mean a quantity calculated from few observations taken on
sample basis. For example, if we select at random ten students from a class of fifty students, measure
their height and find the average height; this average is a statistic.
POPULATION:
BIOSTATISTICS:
When statistics is applied in biology including human biology, medicine and public health, it is known as
biostatistics. Francis Galton (1822-1911) has been called the father of Biostatistics.
2. Define data.
A set of observations like height of students, temperature of patients, blood pressure of patients
and number of paramedical staff, called data
CONTINUOUS VARIABLE: A variable which can assume any value within a given range is called a
continuous variable. Examples of continuous variable are heights and weights of individuals,
level of mercury in a thermometer etc. The value of continuous variable varies without any gaps
or jumps.
DISCONTINUOUS OR DISCRETE VARIABLE: A variable which can assume only some specific
value within given range is called discontinuous or discrete variable. Examples of discontinuous
variable are number of children in a family, number of students in a class.
DEPENDENT VARIABLE:
INDEPENDENT VARIABLE:
4. Define the following
(a) Primary data
(b) Secondary data
(c) Descriptive statistics
(d) Inferential statistics
PRIMARY DATA: the primary data are the firsthand information collected, compiled and
published by an organization for a certain purpose. The data in the Population Census Reports
are primary because these are collected, compiled and published by the Population Census
Organization.
SECONDARY DATA: The data published or used by an organization other than the one which
originally collected them are known as secondary data.
DESCRIPTIVE STATISTICS: Descriptive statistics deals with collection of data, its presentation in
various forms, such as tables, graphs and diagrams and finding averages and other measures
which would describe the data.
INFERENTIAL OR INDUCTIVE STATISTICS deals with techniques used for analysis of data, making
the estimates and drawing conclusions form limited information taken on sample basis and
testing the reliability of the estimates.
5. Describe the quantitative and qualitative data
QUALITATIVE DATA: Data which are described by a qualitative variable, e.g., marital status, sex,
etc. are called qualitative data.
QUANTITATIVE DATA: Data described by a quantitative variable, e.g., heights, weights etc. are
called quantitative data
Classify scales of measurement and write down briefly characteristics of each scale.
Classify scales of measurement and write down briefly characteristics of each scale
DATA ON NOMINAL SCALE: It is the weakest of the four measurement scales. The nominal scale
distinguishes one object or event from another on the basis of a name. For example, we classify
(name) items from an assembly line as defective or non-defective. A new born baby is male or
female. Students in a class may be judged as average, good, very good or excellent.
DATA ON ORDINAL SCALE: Object or events measured on the ordinal scale are distinguished
from one another on the basis of the relative importance of some characteristic they possess.
For example, contestants in a race may be ranked 1, 2, 3, …… according to the order in which
they cross the finish line. Data of this type are usually called rank data.
DATA ON AN INTERVAL SCALE: An interval scale has equal unit but an arbitrary zero point. A
familiar example of interval measurement is the measurement of temperature in Fahrenheit
degrees or Celsius degrees (centigrade). It is important to note that 0 is just a point on the scale.
It does not represent the absence of condition. Zero degrees Fahrenheit does not represent the
absence of heat. In fact, 0 degrees Fahrenheit is about -18 degrees on the Celsius scale.
DATA ON RATIO SCALE: the scale of measurement is the ratio scale when measurements have
the properties of the first three scales and the additional property that their ratios are
meaningful. The measurements of height and wight are examples of measurements on the ratio
scale. (e.g., 30 Kg is thrice of 10 Kg; 20 cm is twice of 10 cm; 8 hours is four times of 2 hours) are
said to be measured on a ratio scale.
4. Find a mean incubation period, median and mode of 9 polio cases given below:
17, 20, 18, 24, 16, 19, 21, 22, 23
SOLUTION:
∑
𝑥̅ = = = = 20
SOLUTION:
∑
𝑋= = = =3
Median = ( ) th value
( ) th value = 3rd value = 3
6. Define Mode.
MODE: The most repeated value in the data is called Mode.
7. Find the mean weight of 100 persons form the following frequency distribution
Weight 45 50 55 60 65 70 75
(Kg)
No. of 5 12 18 20 33 10 3
Persons
SOLUTION:
Weight (in Kg) No. of persons fx
(X) (f)
45 5 225
50 12 600
55 18 990
60 20 1200
65 33 2145
70 10 700
75 2 150
100 6010
∑
Mean = 𝑥̅ = = = 60.1
Solution: Mode = 6
SOLUTION:
MARKS F X
0–9 7 4.5
10 – 19 23 14.5
20 – 20 46 24.5
30 – 39 15 34.5
40 – 49 5 44.5
∑
𝑥̅ = = 23.25
∑
10. The gain in weights of 5 albino rats in a period of 5 days are 5, 6, 4, 8, 7. Calculate the mean.
SOLUTION:
∑
𝑥̅ = = = =6
11. In the given values 4, 5, 6, 7, 8, 9: (a) Calculate the median (b) calculate the mean
SOLUTION:
Median = [ ] th value
=[ ] th value = 7/2 = 3.5
= 3 value + 0.5 [4th value – 3rd value]
rd
=6 + 0.5 [7 – 6 ]
= 6 + 0.5 [1] = 6.5
∑
𝑥̅ = = = = 6.5
12. (a) The arithmetic mean of a distribution is 40 and median is 43, find its mode.
COEFFICIENT OF VARIATION:
The coefficient of variation expresses the standard deviation as a percentage of the arithmetic
mean. Symbolically, the coefficient of variation, denoted by C.V., is given by C.V = * 100
STANDARD DEVIATION:
The positive square root of the mean of the squared deviation of the values from their mean.
5. Find the standard deviation of the respiration rate per minute found to be 16, 18, 10, 17, 21,
24, 22 and 23 in 8 individuals.
SOLUTION:
SOLUTION:
Q1 = [ ] th value = = = 2nd term value of 2nd term is 20
( ) ( ) ( )
Q3 = = = = = 6th term value of 6th term is 33
8. Find the quartile deviation of the weights (in Kg) of 07 people given below: 55, 50, 62, 58, 65,
48, 52.
SOLUTION:
Arranging the given values in ascending order of the magnitude, i.e., 48, 50, 52, 55, 58, 62,65
Q1 = [ ] th value = = =2
( ) ( ) ( )
Q3 = = = = = 6th value =62
Quartile Deviation = = =6
SOLUTION:
∑
𝑥̅ = = = =5
X (X – X )2
3 4
5 0
2 9
7 4
8 9
25 26
∑( – )
S.D = = = 5.2
10. A child born to Mrs. X every year for 7 consecutive years compute standard deviation of
children’s age when youngest is 9 years old.
SOLUTION:
∑𝑥 ∑𝑥 2
9 81
10 100
11 121
12 144
13 169
14 196
15 225
84 1036
∑ ∑
S= [ − ( )2]
S= − ( )2 = 148 − (12)2 = √4 = 2
CHAPTER: 5 PROBABILITIES
1. Define the following terms: (a) Probability
PROBABILITY:
Probable chances of occurrence with which an event is expected to occur on an average.
2. State the difference between: (a) Equally likely events (b) Mutually exclusive events
2. All normal distribution have a particular internal distribution for area under the curve,
whether mean or standard deviation is large or small, the relative area between any two
designated points is always same, so how much area is included under the normal curve
within: µ ± 1Ó, µ ± 2Ó, µ ± 3Ó
SOLUTION:
µ ± 1Ó contain 68.27%
µ ± 2Ó contain 95.45%
µ ± 3Ó contain 99.73%
3. Assume that the systolic blood pressure of is normally distributed with mean B.P 120mmHg
and standard deviation 10mmHg. (a) determine the proportion of men whose B.P is above
140mmHg (b) What is the value of B.P that 5% men have B.P above it.
SOLUTION:
(a) µ = 120, 𝜎 = 10mmHg, P(X ≥ 140) = ?
Z= = =2 P(X ≥140) = P ( Z ≥2) = 0.0228
SAMPLE: A sample is a part of the whole selected with the object that it will represent the
characteristics of the whole.
POPULATION: The whole from which sample is drawn in known, in statistical language, as
population or universe.
PARAMETER: A numerical value such as mean, median or standard deviation calculated form
the population is called population parameter or simply a parameter.
STATISTIC: A numerical value such as mean, median and standard deviation calculated from the
sample is called sample statistic or simply a statistic.
CLUSTER SAMPLING: In cluster sampling, we first select at random clusters (i.e., groups) of
individual items from the population and then choose all or sub-sample of the items within each
cluster to make up the overall sample.
SYSTEMATIC SAMPLNG: Here the samples are equally spaced throughout the area or population
to be sampled. For example, in house-to-house sampling, every tenth or twentieth house may
be taken. More specifically, a systematic sample is obtained by taking every kth unit in the
population after the unit in the population have been numbered.
To select a sample of n = 5 units from a population of N units numbered 1 to N, we take
a unit at random from the first k = 10 units and every 10 th unit afterwards. For instance, it the
first unit drawn is number 8, the subsequent units are having numbers 18, 28, 38 and 48. The
selection of first unit determines the whole sample.
6. Define probability sampling and Non probability sampling. Enlist type of Non probability
sampling.
PROBABILITY SAMPLING: Probability sampling is a procedure of drawing a sample in which
following a sampling plan, every element of the population has a known probability of being
included in the sample.
NON-PROBABILITY SAMPLING: It is a procedure of drawing a sample in which the sample
elements are arbitrarily selected by the sampler because in his judgement the elements thus
chosen will most effectively represent the population.
TYPES OF NON-PROBABILITY SAMPLING:
(i) Judgement or Purposive Sampling
(ii) Quota Sampling
2. Enumerate type-I and type-II errors in brief according to statistical inference in hypothesis
testing.
Briefly describe type-I and type-II errors. (5)
The probability of making type-I and type-II errors is denoted by α and β respectively.
3. A random sample of size 64 is drawn from a finite population consisting of 122 units. If
population standard deviation is 16.8, find the standard error of mean.
SOLUTION:
S.E. of sample mean = 𝜎/√𝑛 = 16.8/ √64 =16.8/8 = 2.1
5. Systolic blood pressure of 566 males was measured and standard deviation 13.5mm of Hg.
Calculate standard error
SOLUTION:
.
S.E. (𝑋) = = = 0.55
√ √
SOLUTION:
(I) When the sample is drawn with replacement, the standard error of sample mean, i.e.,
.
S.E. of 𝑥̅ = = =2
√ √
(II) When the sample is drawn without replacement, the standard error of 𝑥̅ = . =
√
2 =2 * 0.87 = 1.74
2. A Saeed company claims that 90% of its radish seeds germinate. Out of 100 planted, 14 failed
to germinate. Formulate the hypothesis about company’s claim and perform a test of
hypothesis.
SOLUTION:
2. Write the procedure to test the hypothesis about the population mean in small sample.
The steps for testing a hypothesis about the mean of a normal population with unknown
variance against various alternative hypothesis (based on a sample of size n < 30) are
summarized as follows:
(i) H0 : µ = µ0 verses H1 : µ < µ0 or µ > µ0 or µ ≠ µ0
(ii) Choose the level of significance α
(iii) Determine the critical region
t < -tα for the alternative µ < µ0
t > tα for the alternative µ > µ0
|𝑡| > tα/2 (t < tα/2 and > tα/2) for the alternative µ ≠ µ0
µ
(iv) Test to be used: t =
√
(v) Calculations
(vi) Draw the conclusion
3. The height of 10 students selected at random from a school had a mean 116 cm and variance
96 cm. test the hypothesis at 5% level of significance that students are on average less than
120 cm in all.
SOLUTION:
Null Hypothesis H0 : µ= 120
Alternative Hypothesis H1: µ < 120
µ
Test statistic: t= = = -1.23 or |𝑡| =1.23
. ( ̅) .
.
S.E of 𝑥̅ = = 3.26
√
H0 is accepted
SOLUTION:
Null Hypothesis: H0 the severity of disease and incidence of nephropathy are independent
Alternative Hypothesis: H1 Not independent
(𝑶 𝑬)𝟐
Test statistic ꭓ2 = ∑
𝑬
Calculation:
Expected frequency in each class should be the same i.e., = 11
O E O–E (O – E )2
8 11 -3 0.82
15 11 4 1.45
14 11 3 0.82
17 11 -4 1.45
4.54
Here d.f. = 4 – 1 = 3
(𝑶 𝑬)𝟐
ꭓ2 = ∑ = 4.54
𝑬
As the table value of ꭓ2 at α =0.05 with d.f. is 7.81
Since ꭓ2 calculated = 4.54 < ꭓ2 0.05 3 =7.81
H0 is accepted
3. From the following data, use ꭓ2-test and conclude whether inoculation is effective in
preventing tuberculosis.
Group Attacked Non Attacked Total
Inoculated 10 90 100
Non inoculated 26 74 100
Total 35 164 200
SOLUTIOIN:
CELL Observed Expected (O – E)2 (O – E )2/E
frequency frequency
A 10 18 (-8)2 3.555
B 26 18 82 3.555
C 90 82 82 0.780
D 74 82 (-8)2 0.780
TOTAL 8.67
The test statistic is
( )
ꭓ2 = ∑ = 8.67 with d.f (2-1)(2-1) = 1
The table value of ꭓ2 at α = 0.05 with 1 d.f. is 3.64
ꭓ2 (calculated) = 8.67 > ꭓ2 (table value) = 3.84
The null hypothesis is rejected
2. Define correlation and interpret the value of correlation coefficient ‘r’ = -0.982.