# BIOSTATISTICS

Dr. Anjil K Srivastava Department of Biotechnology NIT, Durgapur - 9

Biostatistics Biostatistics has been defined as ³the application of statistical methods to biological sciences´. Development of biostatistics was made during the period of Sir Francis Galton (1822 ± 1911).

 He applied the statistical methods to the analysis of biological variation, correlation and regression.  Karl Pearson (1857-1936) regarded as the father of Modern statistics was motivated by the researches of the Sir Francis Galton. For measuring correlation the Karl Pearson¶s method, popularly known as Pearson¶s coefficient of correlation is the most widely used in practice.

Central tendency
Generally in any distribution, values of the variable tend to congregate around a central value of the distribution. This tendency of the distribution is known as measures of central tendency. There are usually five basic measures of the central tendency. Arithematic mean, median, mode, geometric mean and hormonic mean

It represents the entire data by one value which is obtained by adding together all the values and dividing this total by the number of observation.  .Arithmetic Mean  The most familiar and widely used measure of central tendency is the arithmetic mean.

The sample mean is the average of set of data and is computed as the sum of all the observed outcomes from the sample divided by the total number of events. We use x as the symbol for the sample mean .

. x1  x 2  x 3  ..... X = x/n .. x 2 ... x 3 . ...  x n x! n xn.. .....Arithmetic Mean for series of individual observation x 1 ....

11. 7. 11. 13 First.Example 12. 69 / 6 = 11. 12 + 15 +11 + 11 + 7 + 13 = 69 Then divide by the number of data.5 .5 The mean is 11. 15. find the sum of the data.

and Rs. 3250. Rs. Rs. What is the mean price? Find your answer «. Rs. 5000. 3500. 2750. .. 1000. 3000.Example An electronics store sells CD players at the following prices: Rs. 3750. Rs. Rs.

.3500 + 2750 + 5000 + 3250 + 1000 +3750 + 3000 = 22250 22250 / 7 = 3178.60.60 The mean or average price of a CD player is Rs. 3178.

Arithmetic Mean for discrete series X = fx/n where X = arithmetic mean f = Sum of frequency fx = sum of values of the variables and their corresponding frequencies .

Example The data recorded on the number of chlorophyll deficient plants in a lentil population is given below. Calculate the mean. Number of chlorophyll deficient plants Number of the plants 0 1 2 3 4 5 34 14 20 24 25 33 .

Number of chlorophyll deficient plants (x) Number of the plants (f) fx 0 14 40 72 100 165 fx = 391 0 1 2 3 4 5 34 14 20 24 25 33 f = 150 .

f = 150 x = 391/150 = 2.X = fx/n fx = 391 .61 .61 x = 2.

It provides good basis for comparison. It is based on all the observations.Merits of Mean       It is easy to understand and easy to calculate. It is amenable to further mathematical treatment. It is rigidly defined. It is not affected by the fluctuation of the sampling. .

It can not be accurately determined even if one of the values is not known.Demerits of Mean  The mean is unduly affected by the extreme items.  .

.Median A median is the middle value of the observations or the value which divides a distribution so that an equal number of items occur on either side of it.

11. 13. 11.5 . 12. 7. 15. 15 Then find the number in the middle or the average of the two numbers in the middle. 12. 11 + 12 = 23 23 / 2 = 11. 11. 11.5 The median is 11.First. arrange the data in numerical order. 13 7.

M = size of the n+1th 2  Where M = median n = number of observations .Median in a series of individual observation    Arrange the data in ascending or descending order Median is located by finding the size of n+1/2th item.

15.19.Examples Find out the median from the data recorded on the number of clusters per plant in a pulse crop.17.17. 1 2 3 4 5 6 7 8 9 Data Arranged in Ascending order 10 10 11 12 15 17 17 18 19 .11. No.10. Number of clusters = 10.12 Sl.18.

M = size of the n+1th 2 Median = 9+1/2 Median = size of 5th Item = 15 .

The value of median can be determined graphically.Merits of Median    It is easy to define and easy to understand. It is also recommended in unequal class distributions. The median will not be affected by the size of values of extreme items. However the value of mean can not be graphically ascertained.  .

If the number of observation even. Median is affected more by sampling fluctuation then by the value of mean.    . In this case the mean of two median values will be the estimate of the median. It may be unsuitable in case of large and small items.Demerits of Median  It is not based on all observations since it is positional average. we can not calculate the median.

.Mode The mode is another measure of central tendency which is conceptually very useful. Mode is the most typical value of a distribution because it is repeated the highest number of times in the series.

in which case it is said to be ³Unimodal´. When concentration of data occurs at two or more points such a series called bimodal or multimodal.Definition ³The most commonly occurring value´ According Croxton and Cowden ³the mode of a distribution is the value at the point around which the items tend to be most heavily concentrated. . A set of data may have a single mode.

15. 11. . 13 The mode is 11. 7. 11.12.

in the following set the numbers both the numbers 5 and 7 appear twice.Sometimes a set of data will have more than one mode. . 9. 7. 5 5 and 7 are both the mode and this set is said to be bimodal. 8. 2. 7. 5. 4. 6. For example.

12. 6. 11.Sometimes there is no mode in a set of data. 2. 3. 7. 1 All the numbers in this set occur only once therefore there is no mode in this set. 8. .

4 . 8 Mean 5 Median 4 Mode 4 . 1 .Example-: Find Mean. Median and Mode of Ungroup Data The weekly pocket money for 9 first year pupils was found to be: 3 . 4 . 5 . 12 . 6 . 2 .

Mode of Group Data (1 M 0 ! L1  h (1  ( 2     L1 = Lower boundary of modal class 1 = difference of frequency between modal class and class before it 2 = difference of frequency between modal class and class after H = class interval .

Calculate the mode Number of grains/ panicle Number of Plants 100-110 110 ±130 130-140 140-160 160-170 170-180 11 40 27 34 12 6 .

Number of grains/ panicle 100-110 110-120 120-130 130-140 140-150 150-160 160-170 170-180 Number of plants 11 20 20 27 17 17 12 6 .

2 = 27-17 = 10.12 = 134. 1 = (27-20) = 7.12 .12 7+10 Mode = 134.Mode is lies in the 130-140 (1 M 0 ! L1  h (1  ( 2 L1 = 130. i = 10 Mode = 130 + 7 x 10 = 130 + 70/17 = 130 +4.

It is the point where there is more concentrations of frequencies. .    It is not unduly affected by extreme items.Merits of mode  The mode is easy to calculate and can be determined by mere observation. It is simple and precise.

The value of the mode can not be determined in bimodal distribution. Therefore it is necessary to prepare the grouping table and analysis table to find out the modal class. Sometimes the exact value of the modal class can¶t be known by inspection of the data. . It is not a rigidly defined measure.Demerits of mode     The mode is not based on all the observations.

It is the most commonly used measure of spread. The algebraic sign as in mean deviation is overcome by taking the square of deviation thereby making all positive. Firstly introduced by Karl Pearson in 1893.Standard Deviation  The standard deviation formula is very simple it is the square root of the variance.    .

Standard Deviation (s) = X = arithmetic mean n = number of observations .

Variance Variance is also called mean square deviation. The term ³Variance´ is used to describe the square of the standard deviation. Fisher in 1913. Term was first coined by R. A. It helps us in isolating the effects of various factors.     .

The variance is defined as the mean of squares of deviations. S2 = (x-x)2 n-1 x = arithmetic mean n = number of observations .

A. R. Kolmogorov   . Fisher and Von Mises introduced the empirical approach to probability.N. Pierre Simon De Laplace compiled the first general theory of probability.Probability  In the ninteenth century. The modern theory of probability was developed by Chebychev.A. Markov and A.

Definition Probability is the likelihood of occurrence of an event. .

Example For Animal other than poultry Male (XY) Female (XX) Parents Poultry Male (XX) Female (XY) X or Y X Gamete X X or Y XX Female XY Male) Progeny XX Female XY Male) .

Statistical Explanation    If an event can happen in ³a´ ways and same event fail to happen ³b´ ways Then the probability of its happening ³p´ p= Number of events occurring Total number of trials p= a a+b .

Number of survival after the operation p= Total Number of patient operated P = 160 400 P= 2 5 .Example  If a surgeon transplants a kidney in 400 cases and succeeds in 160 cases. calculate the probability of survival after operation.

The occurrence of head and tail is an event. Performing an experiment called trial and the outcome is termed as event. In simple terms ³An event is the occurrence of something´ Ex. .Event Any possible outcome of a random experiment is called an event.

 The events due to chance are grouped in two categories:-  Mutually exclusive events Independent events .

Mutually Exclusive events Events that are so related among themselves are said to be mutually exclusive. Baby born . Examples ± Coin Toss. if the occurrence of an event excludes the possibility of the other or in other words Two events are mutually exclusive if both can not occur simultaneously.

Example:.Independent Events A set of event said to be independent if the occurrence of any event does not affect the chance of the occurrence of any other event of the set.Toss of two different coins .

Theorems of Probability There are two basic rules of chances:-± Addition Rule Multiplication Rule ± .

p (A/B) = p (A) + p (B) The same rule can be extended for three or more events«. p (A/B/C) = p (A) + p (B) + p (C) . the probability of the occurrence of either A or B is the sum of their individual probabilities.Addition Rule (for mutually exclusive events) Suppose.. Two events A & B are said to be mutually exclusive.

There are 4 kings and 4 queens in a pack of 52 cards. So the probability of king is 4/52 and for the queen same 4/52.   . What is the probability that it is either king or queen? Events are mutually exclusive. one card is drawn at random.Example  From a pack of 52 cards.

The probability the card is either a king or queen --p (A/B) = p (A) + p (B) 4/52 + 4/52 = 8/52 2/13 .

Addition Rule (for Independent events) When events A and B are not mutually exclusive it is possible to both events occur so the rule must be modified«. p (A/B) = p (A) + p (B) ± p (AB) .

p (A/B) = p(A) x p(B) .Multiplication Rule (For independent events)  In this Rules if the two events. the probability of joint occurrence is given by the product of their separate probabilities. ³A´ and ³B´ are independent.

5 Combined probability p (A/B) = p(A) x p(B) ½ x ½ = ¼ =0.½ =0.Example   What is the probability of the heads on two or three successive tosses? ± p(A) = probability of the head in first toss.5 ± p(B) = probability of the head in second toss.5 .½ =0.

the probability of occurrence of one event is dependant on the occurrence of the other event. B & C) = p(A) x p(A/B) x p(C/AB) .Multiplication Rule (For Dependent events)  If two events ³A´ and ³B´ are dependant. p (A&B) = p(A) x p(A/B) p (A.

What will be the probability that both the balls drawn are black? Probability of drawing black ball ---  p (A&B) = p(A) x p(A/B) .Example  A bag contains 7 red and 3 black balls. Two balls drawn at random one after the other without replacement.

 Probability of drawing black ball² p(A/B) = 3/ 7+3 = 3/10  Probability of drawing second black ball² p(A/B) = 2/ 7+2 = 2/9  The Probability that both balls drawn are black² p (A&B) = p(A) x p(A/B) .

p (AB) = 3/10 x 2/9 = 1/5 x 1/3 = 1/15 .

Probability Application   It is useful to find out the results of next generation. It help us to find out the probability of genetic diseases like Albinism.    . It can also be applied in solving the Mendel¶s problems of heredity It also helps in analyzing the pedigrees by breeders. We can also use the probability in predicting the ratio of boys and girls.

Such distribution are called ³Probability Distributions´ or ³Theoretical Distributions´. like centtral tendency measures) of certain population needs to device mathematically. They are not obtained by actual Observation but are mathematically deduced on certain assumption which are based on probability. .Probability Distribution When the frequency distribution (Observation.

These distribution may be discrete or continuous. There are three main types of Probability distribution which are widely used in different studies. ± Discrete Probability Distribution  Binomial Distribution  Poisson Distribution ± Continuous Probability Distribution  Normal Distribution .

It applied where only one or two mutually exclusive outcome such as success or failure. Since it introduced by Swiss mathematician J.Binomial Distribution  It is one of the most widely used probability distribution of random discrete variable.   . This distribution is also known as ³Bernoulli Distribution´. Bernoulli. dead or alive and male and female is possible.

 It means binomial distribution describes the distribution of probabilities where there are only two possible outcome for each trial or experiment. The probability of obtaining head (p) is ½ and the same ½ for tail (q). Thus (p+q) = 1 and binomial is (p+q)n    . If a coin is tossed once there are two possible ways of outcome the head or the tail.

there will be four possible outcome:-T First Coin H H Second Coin H T H T Probability pp = p2 pq = 2pq qp qq = q2 Binomial Expansion is (p+q)2= p2+q2+2pq T .Example If two coins are tossed simultaneously.

The success (p) and failure (q) remains constant for each experiment or trial.   . There should not be any relation between two experiment or trial. All trial must be independent of each other.Assumption of Binomial Distribution  Each trial has only two possible outcome ³success´ or ³failure´.

the total number of possible ways of obtaining ³r´ success and failure (n-r) is: Probability (r success of n trials) p(r) = n! x prqn-r r!(n-r)! where p = probability of success ! = factorial Like 5! = 5x4x3x2x1 Factorial for 0 is always 1  .Formulation In ³n´ trials.

Poisson in 1837. It applied where the event is very rare like when dying due to rare disease. It was derived by Frenchman S.   . in the sense the probability of their happening is very rare.D. are rare events.Poisson Distribution  It is also a discrete probability distribution and is used very widely. number of defective articles produced by a high quality machine.

In these cases ³p´ is very small and ³n´ is the number of trial so. ³np´ is the fixed number known as Poisson distribution. It has a single parameter which is the mean of distribution and is denoted by ³m´ = np which remains constant .

7183 (constant) Where P= probability .3«n success e = 2.2.Formulation  Probability of ³r´ success = -mmr e ! p(r) = e-mmr r! r = 0.1.

It is also called Normal Probability Distribution.Normal Distribution  The most important distribution dealing with continuous variables is the Normal Distribution.    This is first discovered by De Moivre in 1733 . It is extremely useful in the analysis of agricultural and the biological data.

By this method we will get a ³curve´ with peak with evenly distributed items on either side of the peak. .This technique help us in drawing the interference about the population from the sample. Such a ³curve´ with important statistical properties is called the ³Normal Distribution Curve´ which denotes the normally distributed population.

Importance of the Normal Distribution  In the most of biological analyses. As the sample size increases the distribution of mean of a random sample approaches to normal distribution. it serves as a good approximation of discrete distribution such as Binomial and Poisson. values are often distributed in accordance with the normal distribution.  In large sample. .

The height of the curve declines on either side of the peak which occurs at the mean. The mean of a normally distributed population lies at the centre of its normal curve. The mean. The two tails never touch the base.Properties of Normal Distribution      The normal curve is ³bell shaped´ and is symmetrical in appearance having single peak. median and mode all are equal in normal distribution. .

Formulation Normal Distribution (For sample) = z = x .x s Where z = number of standard deviation x = value of random variable x = mean of this distribution s = standard deviation of this ditribution .

Correlation The correlation was first investigated by Sir Francis Galton Karl Pearson introduced a method of assessing correlation by means of the coefficient of correlation. . By this coefficient. we can measure the extent of relationship between two sets of data.

These sets of variables may show a certain relationship or may not show any. But when both variables move together we say they are related.Correlation measures the closeness of the relationship between the two variables. . Example: Height of husbands and wives. 100 seed weight.

If a relationship persist it has to be quantitatively expressed showing a degree of association between the sets of variables. The statistical tool with the help of which this relationship between two variables is studied is called ³Correlation´. Means. the term correlation refer to the study of relationship between two variables. .

Influence of some external factors on two variables. Influence of two variables on each other or mutual influence  Influence of one variable upon the other. .Reason behind correlation    The correlation may be due to pure chance.

Types of Correlation  Positive / Negative correlation Simple/ Partial / multiple corelation Linear/ Non-linear correlation   .

Methods of studying Correlation  Scatter Diagram method Graphical method Correlation coeficient   .

Correlation Coefficient
First two methods do not provide any numerical measures of correlation. The degree of relationship can be established by calculating coefficient called Correlation Coefficient. Which always gives a quantitative measure of the degree of closeness between the two attributes. Karl Pearson developed this theory so it is also called Pearsonian Coefficient of the Correlation´ denoted by ³r´.

Regression
Regression analysis is concerned in measuring the probable form of the relationship between the two variables. The term first used by the Sir Francis Galton while studying the relationship between height of Father and son The method which help us to estimate the unknown value of one variable from known value of the related variable, is called Regression.



Galton studied the average relationship between two variables graphically and called the line describing the relationship, the line of regression. 

Regression technique only applicable where two or more relative variables have the tendency to go back to the mean.

Test of Significance 

The two samples drawn from the same population will show the differences in the mean values. This difference between the sample can be reduced but can¶ be eliminated. A procedure to assess the significance of this difference is known as the ³Test of Significance´. It help us to determine weather observed differences between two samples are actually due to chance or they are really significant. 

Procedure for significance test  Laying down of hypothesis ± ± Null Hypothesis Alternative hypothesis   Level of Significance One or two tailed hypothesis .

Good Luck ! .