You are on page 1of 76

Unit 1 - Basic concepts of probability

1.1. Introduction
The basic foundation of statistics is the probability theory which aims to systematize the laws of
chance to discover the regularities in the pattern in which the events depending on chance repeat
themselves. Probability had its beginning with the games of chance, such as the tossing of a coin,
rolling of a die; drawing a card, etc., in the 17th century. It was only in the 19th century Gregory
Mendel while studying the genetic laws in peas showed that it can be applied to biological
investigations also. Since then it is being applied very successfully to various problems in
biology.

1.2. Basic Concepts of probability - Types of Experiment:


There are two types of experiments, they are:
i) Deterministic experiment
If the same results are obtained when an experiment is repeated under the same conditions, such an
experiment is called deterministic experiment. For example, for a perfect gas, PV = constant, provided
temperature is constant. The same result will be obtained whenever the experiment is repeated. Thus
the results of a deterministic experiment can be predicted with certainty.
ii) Random experiment
The experiment which does not yield the same result when repeated under the same conditions is called
random experiment. In such an experiment it is not possible to predict the result in advance with
certainty. For example, in tossing of a coin experiment one toss may Yield head and other toss may yield
tail. It is not possible in advance to predict the outcome with certainty.

1.3. Basic Terminology


It is required to understand the following terminology to understand the concept of probability.
Trail
Any experiment conducted under identical condition is called trial.
For example:
Tossing of coin or throwing of dice.
Sample space
The totality or collection of all possible outcomes of a random experiment is called sample space. lt is
denoted by S.
For example:
In tossing a coin experiment sample space consists of two possible outcomes Head (H) and Tail (T). It is
usually written as S = {H, T} .
Event
Results or outcomes of a trail or sample space are called events.
For example:
In a trail of tossing coin, getting head or tail are called events.

Types of events: Events may be categorized like,


(1) Simple event
Every distinct outcome of a random experiment is called simple event or an outcome.
For example1:
In tossing of a coin experiment, head is one outcome and tail is another outcome. Hence, head and tail
are the two events in tossing of a coin experiment.
For example 2:
In rolling of a die experiment, getting number 1 on top is one event, similarly getting 2, 3, 4, 5, 6 are
other events.
(2) Exhaustive events
The set of all possible outcomes in any random experiment or a trial is known as exhaustive event for
exhaustive cases.
For example:
In rolling of a die, there are 6 exhaustive cases,
S = {1, 2, 3, 4, 5, 6 }

(3) Compound event


A compound event is one which consists of two or more simple events.
For example:
Getting an even number in a die experiment is a compound event. S= {2, 4, 6}.
(4) Equally likely events or outcomes
The outcomes are said to be equally likely when there is no reason to expect anyone rather than the
other.
For example:
In tossing of a coin experiment, either head or tail may appear, so that both the outcomes are equally
likely.
(5) Mutually exclusive events
Two events A and B are said to be mutually exclusive if the occurrence of one event precludes the
occurrence of another, i.e., both the events cannot happen simultaneously. In other words A and B have
no common outcomes.
For Example:
In the tossing of a coin experiment, head and tail are mutually exclusive events, as they cannot happen
simultaneously. Similarly in a dice experiment, getting outcomes of two numbers (say 1 & 2) are
mutually exclusive.

1.4. Mathematical preliminaries


Complement of an event
Let A be an event. Then, complement of A is the event of non-occurrence of A. It is the event constituted
by the outcomes which are not favorable to A The complement of A is denoted by AI or Ac. While
throwing a die, if A = {2, 4, 6} its complement AI = {1, 3, 5}.

Sub-events
Let A and B be two events such that event A occurs whenever events B occurs. Then, events B is sub-
event of event A.
While throwing a die, let A= {2, 4, 6} and B= {2}. Here B is a subset (sub-event) of event A, and is denoted
by B A.

Union of events
Union of two or more events is the event of occurrence of at least one of these events. Thus union of
two events A and B is the event of occurrence of at least one of them. The union of A and B is denoted
by A B or A+B or A or B
Example 1
while tossing two coins simultaneously, let A= {HH} and B = {TT} be two events.
Then their union is A B = {HH, TT}
Here A is the event of occurrence of two heads and B is the events of occurrence of two tails their union
A B is the event of occurrence of two heads or two tails.
Example 2
while throwing a die, let A ={2,4,6} B={3,6} and C={4,5,6} be three events.
Then, their union is A B C= {2, 3, 4, 5}

Intersection of events
Intersection of two or more events is the event of simultaneous occurrence of all these events. Thus
intersection of two events A and B is the events of occurrence of both of them. The intersection of A and
B is denoted by A B or AB or A and B.

1.5. Counting Techniques - Permutations and Combinations


Some useful techniques for counting the number of events satisfying certain conditions is
presented in this section. These techniques are useful in computing the probability of an event
when the total number of possible events is large.

Factorials
For any given positive integer n, the product of all the whole numbers from n down through 1 is
called n factorial and is written as n!
For example,
5! = 5x4x3x2x1=120
8! = 8x7x6x5x4x3x2x1 = 40320
And in general
n! = n(n-1)(n-2) .............2x1
By definition, 0! = 1
Further,
n! = n(n-1)!
= n(n-1)(n-2)! And so on

Factorials are useful in finding the number of ways objects can be arranged in a line. For
example, suppose that there are 3 containers of culture media, each of which is inoculated with a
different organism. These culture media can be placed in a line on a platform in 3! = 6 ways. If
the 3 media are designated as a, b and c, then 6 arrangements are abc, bca, cab, cba, bac, acb.

Combinations
Defined

Number of ways of selecting ‘r’ things out of ‘n’ things

Permutations
Defined

Number of ways of arranging ‘r’ things out of ‘n’ things

1.6. Definition of Probability


We present here two definitions of probability.

Definition I - Classical or mathematical or apriority definition


Suppose an event E can happen in 'm' different ways (outcomes) out of a total of 'n' different
exhaustive, mutually exclusive and equally likely ways, then the probability of occurrence of an event
denoted by P (E) is given by,

Note 1
Probability of an event is a non-negative number which lies between 0 and 1.
Symbolically 0 ≤ p ≤ 1.
Note 2
If the event E can happen in 'm' ways out of total of ‘n’ ways, then the number of ways in which the
event A will not happen is n - m. Hence, the probability that an event E will not happen (denoted by q) is
given by,

So that, p+q=1
i.e., the sum of the probabilities of occurrence and non-occurrence of an event is equal to 1,
Example 1
What is the probability of getting head when an unbiased coin is tossed ?
Answer:
Total number of equally likely outcomes = 2 = { H, T}
No. of favorable outcome =1 = {H }
Therefore, the probability of getting head is, P (H) = ½.
Example 2
What is the probability of getting an even number when an unbiased die is thrown?
Answer
Total Number of equally likely outcomes = 6
Number of favorable outcomes
P (Even number) =3/6=0.5
Example 3
In a pond containing 100 fishes 20 are marked. If one fish is subsequently caught what is the probability
of it being (i) marked (ii) unmarked?
Answer
Total number of fishes =100
Number of marked fishes =20

(i) Hence the numbers of favorable cases for marked fish are 20.

(ii) P(marked fish being caught) + P (unmarked fish being caught) = 1


Hence, P(unmarked fish being caught) = 1- P(marked fish being caught) = 1-0.2 = 0.8

Example 4
In a composite fish culture experiment, fingerlings of 6 species of fish namely, rohu, catla, mrigal,
common carp, silver carp and grass carp, were stocked in the ratio of 1 : 1 : 1 : 2.5 : 3 : 1.5 respectively. A
fingerling is subsequently drawn, what is the probability that it is of catla?
Answer
Fingerlings of rohu, catla, mrigal, common carp, silver carp and grass carp are stocked in the ratio of 1 : 1
: 1 : 2.5 : 3 : 1.5 respectively. Thus out of 10 fingerlings we have 1 fingerling of catla. Hence, the
probability that the fingerling drawn is of catla, = 1/10 = 0.10

1.6.1. Definition-II : Relative frequency or aposteriori or empirical definition of probability


Classical definition of probability defined earlier assumes that outcomes are equally likely the total
number of outcomes is known finite. When these assumptions are not met with, it is not possible to
compute the probability of an event using the classical definition. In order to overcome the above
limitations, a new approach called relative frequency concept of probability is adopted. According to this
concept the probability of occurrence of an event E is the limiting value of ratio of frequency of
occurrence of the event to the total number of outcomes. For instance, if an experiment was repeated n
times under the same conditions and if an event E has occurred f times, then the estimate of the
probability of an event E as the number of trials n Increase indefinitely is given by,

It is to be noted that as the number of trials (frequency) increases the estimate of probability of an
event stabilizes around a particular value.

Example 5
The frequency distribution of lengths of 1000 randomly selected fishes of a particular species is given
below. What is the probability that a fish chosen at random will have length between 35-45 cm?

Answer

 Frequency of the class interval 35-45 is 220.


 Therefore, the relative frequency of this class to the total frequency is 220/1000 = 0.22.
 Hence, probability that the fish chosen at random will have length between 35-45 cm is
0.22.

Example 6
One thousand fertilized eggs of a major carp were kept under observation to find out the number
of individuals reaching different stages in the life history. Observed data are given below:

Find the probability that,

 Fertilized egg reaches fingerling stage


 Hatchling reaches fry stage
 Fry reaches adult stage

Answer
 Out of 1000 fertilized eggs, only 200 reached the fingerlings stage. Therefore, the
probability of fertilized I egg reaching the fingerling stage is 200/1000 = 0.2
 Out of 700 hatchlings only 210 reached the fry stage, therefore the probability of
hatchling reaching the fry stage is 210/700 = 0.30
 Out of 210 fry 196 reached the adult stage, therefore, the probability of the fry reaching
the adult stage = 196/210 = 0.928

1.7.Theorem on probability
After understanding the basic definition of probability, it is required to know laws of probability to
compute probability when more than one trail is conducted or when probability of different types of
events are required. There are two important laws of probability. They are

I. Addition Theorem
Let A and B be two events with respective probabilities P(A) and P(B). The probability of occurrence of at
least one of these two events denoted by
P(A+B) is given by P(A+B) = P(A) + P(B) - P(AB) where P (AB) is the probability of simultaneous occurrence
of A and B or P(A B) = P(A) + P(B) – P(A B)

Corollary: If events A and B are mutually exclusive, then,


P(A+B) = P(A) + P(B)
P(A B) = P(A)+P(B)
It is because P(A B) = 0, when A and B are mutually exclusive or P(A B) = 0

Example 7
In a certain district 25% of the fish farmers practice composite fish culture of rohu catla and mrigal, 15%
fish farmers follow monoculture of rohu only and 10% farmers follow composite fish culture as well as
monoculture of rohu in their farm. Find the probability that a randomly selected fish farmer follows at
least one of the practices.
Answer
Let events A and B be,
A : The farmer follows composite fish culture
B : The farmer follows monoculture of rohu
Then, P (A) = 0.25, P (B) = 0.15, and P (A B) = 0.10
The probability that the farmer follows at least one of the practices is denoted by P (A+B) and is given
by,
P(A B) = P(A) + P(B) - P(A B) 0.25 = 0.15 - 0.10 0.30
Example 8
A pond contains 150 fishes of rohu, 225 fishes of cattle and 125 fishes of mrigal. Find the probability that
a fish randomly selected is rohu or a catla.
Answer
Let events A and B be,
A: Selected fish is rohu
B: Selected fish is catla
Events A and B are mutually exclusive as a fish selected cannot be both rohu and catla. Hence,
P(A B) = P(A) + P(B)

P(A B) = 0.30+0.45 = 0.75

Independent events
The events A and B are said to be independent, if the occurrence of one does not depend on the
occurrence or non-occurrence of the other.
For Example
When a Coin is tossed two times, the result of the second throw does not depend on the result of
the first throw.

Conditional probability

 Let A and B are two events in the sample space, then P (A/B) denotes the probabilities of
the happening of event A, when B has already occurred.

Note : P(A/B) ≠P(B/A)

 Given 2 events A & B of sample space such that P(A/B) denotes the probability of
happening of event A such that event B is already occurred.

Note : P(A/B) ≠ P(B/A)

1.7.1. Theorem on probability


II.Multiplication Theorem
Let A and B be two events with probability P (A) and P (B) respectively. Let P (B/A) denote the
conditional probability of event B, given that event A has happened P (A/B) the conditional
probability of event A, given that event B has happened. Then the probability of occurrence of
both the events A and B denoted by P (A B) is given by,

P(A B) =P(A). P(B/A) =P(B). P(A/B)


If events A and B are independent, then,
P(A B) = P(A). P(B) = P(B). P(A)

Example 9
In a pond containing 100 fishes, 35 are marked. If two are caught one after another and without
replacement, what is the probability that both the fishes caught are marked?
Answer
Let A denote the event of catching marked fish in the first draw and B denote the event of
catching marked fish in the 2nd draw, then
P(A) =35/100
The probability of drawing marked fish in the 2nd draw, given that the first fish caught was
marked is,
P(B/A) =34/99
Hence, P (both the fish caught are marked)= P(AB)= P(A). P(B/A) = (35/100)(34/99)=1190
/9900=0.12
Example 10
A pond contains 200 fishes of which 40 are marked. A second pond contains 300 fishes of which
50 are marked. One fish is drawn from each of the ponds. What is the probability that the fishes
drawn are both marked?
Answer
Let A denote the 2nd event of catching marked fish from 1st pond, B denote the event of
catching marked fish from 2nd pond.
Hence,

As the events A B are independent,

Example 11
An urn contains 7 white and 8 black pomfrets. A second urn contains 5 white 9 black pompfrets. One
pompfrets is taken out at random from the first urn and put into the second urn without noticing its
colour. A fish is then drawn at random from the second urn. What is the probability that it is a white
pompfrets?
Answer
Two cases arise here
Case (i) Pompfrets taken first urns is white
Let A denote the event of drawing white pomfrets from first and let B denote the event of drawing
white pompfrets from second urn.
Here, P(A) =7/15, P(B/A)=6/15
Hence, P(AB) =P(A).P(B/A) = (7/15) (6/15)=42/225=0.1867

Case (ii) Pompfret taken out from first urn is black


Let A denote drawing black pompfrets from first urn and let B denote drawing white pompfret from the
second urn
Here, P(A) =8/15, P(B/A) =5/15
Hence, P(AB) =P(A).P(B/A) = (8/15) (5/15)=40/225
Therefore, required probability
= P(Case i) + P(Case ii)
= (42/225) + (40/225)
= 82/225
= 0.36

1.8. Exercises
a) A die is rolled; find the probability that the number obtained is greater than 4.
b) Two coins are tossed, find the probability that one head only is obtained.
c) Two dice are rolled; find the probability that the sum is equal to 5.
d) A card is drawn at random from a deck of cards. Find the probability of getting the King of heart.
e) A Fish is drawn at random from an Aquarian consisting of 6 Gold & 4 black Mollies co Find the
probability of getting Gold Fish.

Answers to the above exercises:


a) 2 / 6 = 1 / 3
b) 2 / 4 = 1 / 2
c) 4 / 36 = 1 / 9
d) 1 / 52
e) 6/10=0.6

Unit 2 - Probability distributions

2.1. Introduction
In real life situation we always infer about a population on the basis of a sample study. For a
given frequency distribution of a variable in the sample under study, we can get relative
frequencies which are probabilities of occurrence of different values of random variables.
Probability distribution is analogous to a relative frequency distribution with probabilities
replacing relative frequencies. Thus, probability distributions can be regarded as theoretical or
limiting forms of relative frequency distributions, when the number of observations made is very
large. Hence, probability distributions can be considered as distributions, of populations, whereas
relative frequency distributions are distributions of samples drawn from these populations.
Frequency distributions which arise in sample can be approximated by well known theoretical
probability distributions which serve as useful tools in making inferences and decisions under
conditions of uncertainty on the basis of limited data or theoretical considerations.

Unit 2 - Probability distributions

2.2. Random variable


Random variable is a real number ‘x’ assigned for outcome of a random experiment. Consider an
experiment of tossing of a coin twice. Then the experiment has the outcomes HH, HT, TH and
TT where H and T refer to head and tail respectively. If the number of heads is considered as a
random variable X, then its value corresponding to different outcomes is as follows:

Thus random variable is a function that assigns a real number to each outcome in a sample space of a
random experiment.

2.3. Probability distributions


There are two types of probability distributions depending upon the type variables under study. It
may be discrete and continuous. Probability distribution is said to be discrete if it is based on a
discrete random variable and continuous when it is based on a continuous random variable. A
mass probability distribution for discrete random variable is a listing of all possible values with
respective probabilities of occurrence.

Characteristics of probability distribution:

 Probabilities are non-negative i.e. P(x) ≥O


 Sum of probabilities of all values of random variable is equal to unity. i.e., ∑P(x) = 1.

As discrete random variable can take only a finite number of values or a countable infinite
number of values, it is possible to list all the values with the corresponding probabilities. The
probability distribution of a discrete random variable is called probability mass function.

In the case of continuous random variable, it is no longer meaningful to list all the values with
the corresponding probabilities and hence the probability of a random variable falling in a given
interval is listed. A histogram can be drawn taking probability on Y axis with large number of
small intervals of random variable on X axis. A smooth curve passing through upper sides of the
rectangles of the histogram can be drawn. In many cases it is possible to determine a function
f(x) that approximates the curve. This function is called a probability density function. Here also
the two basic conditions should be satisfied (i) f(x) ≥ 0 and (ii) the total area under the curve f(x)
and the x-axis is equal to 1.

Example1
Find the probability distribution of an outcome in throwing of a dice experiment.
Answer
Let X denote the outcome of the experiment. Then the probability distribution of X is given by,

Note that ∑P(X) = 1


2.4. The Binomial distribution or Bernoulli distribution
Binomial distribution is a discrete distribution. It has great practical applications in research and
industrial inspection problems. It arises when a single trial of some process or experiment can result in
only one of two mutually exclusive outcomes such as male or female, with or without scales, dead or
alive, whether responds or not to a given stimuli and so on.

Mathematical description of the binomial distribution can be as given below.


Suppose that the individual examined possess certain character with probability p and that does not
possess it with probability 1-p=q. Then, the probability of x individuals in a sample of n individuals
possessing the character is given by,

A random variable X is said to follow binomial distribution if its probability mass function is
given by (1). If p and n are known, this distribution can be completely determined. Hence, p and
n are called parameters of the binomial distribution. In this distribution, it is assumed that each
trial results in one of two possible mutually exclusive outcomes namely ‘success’ or ‘failure’.
Further, p is assumed to be constant from observation to observation and outcomes of
observations are independent.

Important properties of the binomial distribution

 Mean of the binomial distribution = np


 Variance of the binomial distribution = npq
 Hence Standard deviation =
 As n, sample size increases the binomial distribution approaches the normal distribution.
 Various properties can be seen in the adjoining figures:

For p=q= ½ binomial distribution is symmetrical;


For p< ½ it is positively skewed;

For p> ½ it is negatively skewed;

2.4.1. Examples
Example 2
Find the probability of getting only one catla in a sample of 10 fishes drawn one by one, if the probability
of a catla being drawn in any draw is 0.2.
Answer
In a sample of size n, the probability getting x catla is given by,

For x = 0, 1, 2....................10
In this example p is the probability of a catla being drawn in any draw which is given to be 0.2,
i.e., p = 0.2
q = 1-p = 1-0.2 = 0.8
Sample size n = 10, x = 1, i.e., getting one catla
Probability of getting one catla is given by,
Example 3
What is the probability of finding 2 males in a sample of 5 fishes drawn one by one? (Assume sex ratio is
1:1).
Answer
The probability of finding male = 0.5 = p (say).
The probability of not finding male (i.e. finding female) 0.5 = q (say). Further n=5 and x=2.
Hence, the required probability using binomial distribution is given by,

2.5. Fitting of Binomial distribution


To fit any theoretical distribution one should know its parameters and probability distribution.
Parameters of Binomial distribution are n and p. Once p and n are known, binomial probabilities for
different random events and the corresponding expected frequencies can be computed. From the given
data we can get n by inspection. For binomial distribution, we know that mean is equal to np hence we
can estimate p as = mean/n. Thus, with these n and p one can fit the binomial distribution.
Example 4
The number of sets of catla which responded in induced breeding out 10 fishes tried per experiment was
noted. A total of 100 such experiments were conducted in a centre. The results are summarised in the
following frequency distribution table:

Fit the binomial distribution.


Answer
Here n = 10, ∑f = 100 = N
Computations are summarized below:

* Expected frequency = N P (x)


2.6. The Poisson Distribution
The Poisson distribution is another discrete probability distribution which has frequent
applications faunal sampling operations where the character or variate under study is the number
of animals or species per unit of observation. In practice, if the count data represent the number
of rare events occurring within a given unit of time or space (or some volume of matter), the
distribution of these counts can be described by the Poisson distribution. If ‘p’ the probability of
occurrence an event is very small and ‘n’ the number of trails is very large such that np is
constant, then Binomial distribution tends to follow Poisson distribution. A random variable is
said to follow Poisson distribution if its mass probability function is given by,

Where e is the base of natural logarithm having a value of 2.7183, m is the mean of the
distribution. If m is known this distribution can be completely determined and is called
parameter of the distribution. An important characteristic of the Poisson distribution is that its
variance is equal to the mean of the distribution. The Poisson distribution is positively skewed.

Fig (4): Poisson distribution

However, as m (= np, when n is large) increases, it will tend to normal distribution. In Poisson
distribution, it is assumed that rare events occur randomly and independently. Some examples of
Poisson variable are number of ships arriving in a harbour per hour, number of animals per
square of plankton species, the number of machines breaking down daily in a fish processing
plant. As variance equals mean in the case of Poisson distribution, the ratio of former to the latter
(i.e. s2/m) can be used to determine whether the variable under study is randomly distributed or
over-dispersed. Theoretically if this ratio is greater than 1, the population is over dispersed and
Poisson distribution will not be suitable to describe this population.
2.6.1. Examples
Example 5
In the study of a certain fish species, a large number of samples were taken from a pond, and the
number of fish in each sample was counted. The average number of fish in sample was found to
be 2. Assuming that the number of fish follows a Poisson distribution find the probability that (i)
there are exactly 3 fishes, (ii) there are more than 4 fishes.
Answer
In this example m = 2
Example 6
The data given below refer to the number of animals per square of a particular species of
plankton counted in a plankton counting cell. Compute the Poisson probabilities and the
expected frequencies.

Answer
To compute Poisson probabilities, arithmetic mean “m” of the distribution is required.
In the given example,

The Poisson probabilities for different values of x are computed using the Poisson distribution,

Computations are summarized below:


* Expected frequencies are obtained by multiplying the respective probabilities by N, the total frequency.
2.7.The Normal Distribution
The normal distribution is one of the most important distributions in statistics; its equation was
first given by De-Moivere in 1733. Later it was rediscovered and developed by Gauss in 1809
and by Laplace in 1812. Therefore, this distribution is sometimes referred to as Gaussian and
Laplace distribution. The probability density function of a random variable having normal
distribution is given by,

Where µ and are respectively the mean and standard deviation of the distribution. π and e are
constant whose values are equal to 3.1416 and 2.7183 respectively. The graph of f(x) is a famous
“bell-shaped” curve.

Normal distribution can be completely identified if mean (µ) and standard deviation ( ) are
known. The distribution will vary depending upon the values of m and S which are given in Fig
(a) and Fig (b). It is a continuous distribution and can theoretically assume any value from -
to + . However, for all practical purposes the values lie in the range of plus or minus three
standard deviations from the mean.

Fig. a: Distributions with the same standard deviation but different means
Fig. b: Distribution with the same mean but different standard deviations

2.7.1. Properties of normal curve

 It is continuous, symmetrical and bell-shaped curve.


 It is asymptotic. Both the tails extend to infinity, (-∞ to ∞) i.e., the tail approaches the
base but never touches it.
 The arithmetic mean, median and mode coincide.

 The central position of the curve will be described by the mean and the spread of the
curve by the standard deviation.
 The coefficient of skewness is zero and the coefficient of kurtosis is 3
 Mean plus or minus one standard deviation (m± ) includes 68.00 percent (68.27% to be
more precise) of the total frequency or total area of the curve.
 The area under Normal curve is always are,
Fig. a: Area between m ±1
(ii) Mean plus or minus 1.96 standard deviation (m ±1.96) includes 95% of the total frequency.

Fig.b: Area between m ± 1.96


(iii) Mean plus or minus 2.58 standard deviation (m ± 2.58) includes 99% of the total frequency.

Fig.c: Area between m ± 2.58

2.7.2. Area under the normal curve


The area bounded by the normal curve on the x - axis is 1. Quite frequently the area under this
curve that falls between two points on the x - axis, say, x = a and x = b is required. This area can
be worked out using integral calculus. However, it is not necessary to work out the area by this
method as tables giving the areas under the normal curve are available for ready use. These
tables give the area under the normal curve which has mean zero and standard deviation one
(called standard normal curve). Hence to make use of this table, the normal variable X has to be
transformed to normal variable Z by the following relation,
As the standard normal curve is symmetric (Fig. below) about Z = 0, the area between Z = 0 and
any negative Z value, say, Z = -a, is equivalent to the area between Z = 0 and Z = +a.

Fig : Standard normal curve

Example 7
Weight of a particular species of fish was found to be distributed normally with the mean 400
grams and standard deviation 50 grams. Find the standard normal variate of fishes with weights
(i) 300, (ii) 450 and (iii) 430.

2.7.3. Different forms of area tables


Area under the normal curve is available in tables in different forms. For instance:
(i) In Fisher and Yates (1963) the area of normal curve is tabulated from Z to
Fig. a: Area from Z to

(ii) In Spiegel (1981) area of the normal curve is tabulated from 0 to any positive value of Z.

Fig.b: Area from 0 to Z

(iii) In Woolf (1968) area of the normal curve is tabulated from - to Z.

Fig.c: Area from - to Z

Note:
Before referring to these tables, it is therefore necessary to know the manner in which areas are
presented.
In the present manual, area tables as presented in Spiegel (1981) are referred to.
Example 8
The mean length of a one-year-old brood of catla is 30 cm and standard deviation 2 cm. A fish is
caught at random, find the probability that its length is,
(i) (a) Between 30 and 32cm
(b) Between 28 and 33 cm
(ii) Suppose it was decided to transfer all those having length greater than 31cm, what percent of
fish is required to be transferred? Assume lengths are normally distributed.
Answer
(i) (a) Compute the corresponding standard normal variate Z, for X1=30 and X2 =32
They are,

The probability that the length of the fish caught is between 30 and 32 cm in terms of Z will be,
P (0≤Z≤1) = Area between (Z=0 and Z=1) =0.3413
The area is obtained by referring to the area table of normal distribution.

(b) Length between 28 and 33 cm i.e. X1= 28, X2= 33, corresponding standard normal variates
are,

The probability that the length of the fish caught is between 28 and 33 cm in terms of Z will be

P (-1≤Z≤1.5) = Area between Z = -1 and Z = 1.5


= (Area between Z = -1 and Z=0) + (Area between Z = 0 and Z = 1.5)
= (Area between Z = 0 and Z = 1) + (Area between Z = 0 and Z = 1.5)
= 0.3413 + 0.4332
= 0.7745

(ii) Here X1 = 31 cm hence,

P(fish is having length greater than 31 cm)


= P (Z>0.5)
= (Area to the right of Z=0) - (Area between Z=0 and 0.5)
= 0.5 – 0.1915
= 0.3085
Therefore, 30.85% of the fishes require to be transferred.
2.7.4. Features of the Normal distribution
Normal distribution plays an important role in statistics because of the following reasons:
 Numerous continuous phenomena or characters such as fish length, weight, body depth
etc., are approximately normally distributed.
 Many of the discrete distributions occurring in practice such as Binomial, Poisson, etc.,
can be approximated by a Normal distribution.
 Even when the variable is not normally distributed, it is possible to bring it to
approximately normal, by simple transformations such as square root or logarithmic or
arc sin, etc.
 Normal distribution has simple and interesting mathematical properties.
 Normal distribution provides the basis for statistical inference.
 Distribution of sample statistics such as sample mean tend to normality for large ‘n’ and
hence they can be studied with the help of Normal distribution.
 2.8. Estimation of population size
The Hyper geometric distribution is useful in estimating the number of fish in a closed
water body. Let there be N fish in the water body, N being unknown A catch of M fish is
taken and marked and returned alive into the water body. After allowing a reasonable
time for fish to randomly mix with unmarked fish a catch of n fish is taken. Here M and n
are regarded as fixed pre-determined constants. Among these ‘n’ fish caught there will be
say x marked fish, where x follows a hyper geometric distribution given as below. The
maximum likelihood estimate of N is given by ;


 Example 10
Five fish of a particular species which is thought to be near extinction in a certain region
have been caught, tagged and released to mix into the population. After they had
completely mixed with the rest of the population, a random sample of 10 of these fishes
is selected. If there are 25 fish of this species in the region, what is the probability that in
the second sample ;
(i) There are 2 tagged fish
(ii) At the most 2 tagged fish
Answer
Example 11
In order to estimate number of fish in a pond 200 fish were caught and tagged and
released into the pond. After the tagged fish thoroughly mixed with rest of the population,
a random sample of 50 fish was selected and 12 were found to be tagged. Estimate the
number of fish in the pond.
Answer

Unit 3 - Estimation

3.1. Introduction
Statistical inference is that branch of statistics which deals with the theory and techniques of
making decisions regarding the statistical nature of the population using samples drawn from the
population.
Statistical inference has two branches.
They are:

 Theory Estimation and


 Testing of hypothesis.

3.2. Basic Terminology

 Population: It is the totality of individuals in a statistical investigation. For example in


the study of knowing mean fish catch per boat in a landing centre, fish catches of all
boats operating in the centre constitute statistical population. A study on the fish catches
of all boats is called census survey.
 Sample: A few representative selection of individuals form a population. For example in
the study of knowing mean fish catch per boat in a landing centre, a few boats operating
selected randomly form a population. A study on the fish catches of all selected boats is
called sample survey.
 Parameter: Population mean, population standard deviation, population size etc. which
are quantitative characteristic of population are called parameters.
 Statistic: Sample mean, Sample standard deviation etc. which are function of sample
values are called statistic.
 Parameter space: A set of all the admissible values population parameters is called
parameter space.
 Sampling distribution: The distribution of values of a statistic for all different samples
of the same size is called sampling distribution of the statistic.
 Standard Error: Standard deviation of sampling distribution of the statistic is called
Standard Error.
 Estimator: Statistics which is used to estimate population parameter.
 Estimate: Value or range of values used to estimate population / parameters.
 Point Estimate: A single value used to estimate population parameter.
 Interval estimates: A range of values used to estimate population parameter at a given
level of confidence is called confidence interval.
 Confidence coefficient: The probability that a confidence interval contains parameter is
called Confidence coefficient.
 Confidence limits: They are the limits of the confidence interval.
 Estimation: It is the process of estimating population parameter.

3.3.The table of standard error of some statistics

3.4. Point estimate & Utilities of standard Error


Point Estimate

 Sample mean is the point estimator of population mean.


 Sample proportion is the point estimator of population proportion.

Utilities of Standard Error


Standard Error (S. E) is a measure variability of the statistic. It is useful in estimation and testing
of hypothesis.

 Standard Error is used to decide the efficiency and consistency of the statistic as an
estimator.
 In interval estimation, Standard Error is to write down the confidence intervals.
 In testing of hypothesis, standard error of the test statistic is used to standardize the
distribution of the test statistics.
3.5.Unbiased Estimator
An estimator is said to be unbiased if the average of all values taken by the estimator is equal to
the population parameter
For example: Sample mean is an unbiased estimator population mean because average of all
means of samples of same size taken from a population is equal to the population mean.
Similarly, Sample proportion is the unbiased estimator of population proportion and Sample
variance is the unbiased estimator of population variance.
Note :
Sample variance is given by,

3.6.Interval Estimators
3.6.1.Case 1 - (1- α ) % Interval estimates for population mean when σ is known is given by.

Note:
If σ is unknown then, it is to be replaced by sample standard deviation ,‘s’.

The maximum error (E )given by,

Note:
Larger the sample size( n), lower is the error(E).

The sample size can be determined for the known confidence level by using the
formula :

3.6.1.1.Examples
Example1
From a random sample of 25 fishes taken from a pond, the average length is found to be
5.2cm.Assuming the length follows a normal distribution with an unknown mean and a standard
deviation of 0.5 cm.
1.Calculate the 95% Confidence Interval for the mean length of fishes in the pond.
For a confidence level of 95%, the critical value is zα/2 = 1.96.

2.Indicate the sample size needed to estimate the average length with an error of ± 0.5 cm and a
confidence level of 95%.

Example 2
A sample of the sixteen fishes has been taken from a pond gave the following length
measurements in mm.
95, 108, 97, 112, 99, 106, 105, 100, 99, 98, 104, 110, 107, 111, 103, 110.
Assuming that the length fishes follow a normal distribution with variance of 25 mm2 unknown
mean:
1. What is the distribution of the sample mean?

2. Determine the confidence interval at 95% for the population mean.


Example 3
With a confidence level of 90%, what would the minimum sample size need to be in
order for the true mean of the length to be less than 2 cm from the sample mean?

3.6. Interval Estimators


3.6.2. Case 2: (1- α ) % Interval estimates for population proportion is given by,

Where : p is the sample proportion and

The maximum error E given by:


Example 1
If the sample size is 64 individuals, and the percentage of female individuals in the sample is
35%, determine using a significance level of 1%, the corresponding confidence interval for the
proportion of the female in population.
Solution
α = 0.01;
1 − α = 0.99;
zα/2 = 2.575.

Example 2
In a sample of 400 selected at random, a sample mean of 50 was obtained. Determine the
confidence interval with a confidence level of 97% for the average population.
Solution:

Example 3
With the same confidence level, what minimum sample size should it have so that the
interval width has a maximum length of 1?
Solution

Unit 4 - Testing of hypotheses

4.1. Introduction
Many situations call for verification of statements on the basis of available information.
For example, a researcher may be interested in verifying the statement 'Fish species A
grows faster than fish species B' using the average growth of fish of species A group
faster than fish B based on say information collected on 40 farms. Verifying a statement
concern in a population by examine a sample from that population is called testing of
hypothesis.

Estimation and testing of hypothesis are not as different as they appear. For example,
confidence intervals may be used to arrive at the same conclusions that are reached by
using the testing of hypothesis procedure.
4.2. Terminology
Before carrying out any statistical test, it is required to understand following terminology
involved in testing of hypothesis.

Statistical hypothesis
Statistical hypothesis is a statement about the population under study. It is usually a statement
about one or more parameters of the population. Such statement may or may not be true.
Examples of hypothesis are:

 Mean weight of one-year-old oil sardine is 80 grams.


 Feed A and B are equally effective in increasing the weight of fish.
 Probability of getting number 4 when a die is thrown is 1/6.

Null hypothesis
The hypothesis to be tested is commonly designated as ‘Null hypothesis” and is denoted usually
by Ho.
For example:

 To decide whether one procedure of fish processing is better than the other in terms of
shelf life.
 Then null hypothesis can be formulated as below:
 Ho: There is no difference in the shelf life of two procedures.
Alternative hypothesis
Any admissible hypothesis that differs from a null Hypothesis is called an alternative hypothesis
and is denoted by H1.
For example:
In an experiment to compare the efficiency of 4 feeds.
Then hypothesis are:

 Ho: There is no significant difference among feeds.


 H1:There is significant difference among feeds is an alternative hypothesis.

Test statistic
It is a function of sample values. It extracts the information about the population parameter
contained in the sample. The observed value of the test statistic serves as a guide in rejecting or
not rejecting the null hypothesis.
For example,

 In testing of null hypothesis that value of the population mean is µo i.e. Ho : µ=µ0 the
statistic used is,

Rejection region
After the test statistic to be used is selected, the set of possible values of a statistic are
divided into two mutually exclusive regions viz : rejection region (critical region) and
acceptance region (Region of non rejection). If the observed value of a test statistic falls
in the rejection region, Ho is rejected. If it falls in the acceptance region, it is not rejected.
It may be noted that if the observed value falls in the acceptance region, it does not prove
the hypothesis; it simply fails to disprove it.

4.3.Type I and Type II errors


In testing a hypothesis two kinds of errors are likely to be committed. They are Type I and Type
II errors. If null hypothesis is rejected when it is actually true, then such error is called Type I
error. On the other hand, if null hypothesis is accepted when it is false, then Type II error is
committed.
This is summarized in the following table:

Table: Statistical decision table


In order that any test of hypothesis to be good, it must be so designed as to minimize both the
errors i.e. minimize both α and β.

For a fixed sample, size it is difficult to minimize both α and β, as an attempt to decrease one
may lead to an increase in the other. It is customary to fix α at a predetermined level and
choose a test procedure that minimizes β i.e., α is prefixed in a test and β is minimized. Thus, we
run the risk of rejecting a true H0 but reduce β, the acceptance of false H0 to minimum. Test
criterion is developed on these principles.

4.4.Level of significance
In testing a given hypothesis, the maximum probability with which we would be willing to risk
type I error is called the level of significance of the tests, denoted by α.
In other words, it is a way of quantifying the amount of risk one wants to take in rejecting a true
hypothesis.

Usually 5% or 1% levels of significance are chosen. These levels, however, depend on the gravity
of the risk which costs in decision making.

To illustrate suppose 5% level of significance is chosen in designing a test of hypothesis, then


there are about 5 chances in 100 that the hypothesis is rejected when it should be accepted, i.e.
one is 95% confident about the right decision. The decision as to which values go into the
rejection region and acceptance region is made on the basis of desired level of significance ‘α’.
Tests of hypothesis are sometimes called tests of significance because of the term level of
significance and the computed value of the test statistic that falls in the rejection region is said
to be significant.

4.5.Degrees of freedom
The number of independent observations available from the data for estimation of a particular
parameter or a quantity is called the ‘degrees of freedom’.

It can be calculated by deducting from the number of observations, the number of constants that
are calculated from the data. For instance, the estimate of population variance based on a sample
of ‘n’ observations is given by,

In this case the constant (parameter), population mean, is estimated by the sample mean
. Hence, deduct 1 from the total number of observations ‘n’ to get the degrees of
freedom, i.e., the degrees of freedom of s2 is (n-1).

In the case of (r x c) contingency table, degrees of freedom is equal to (r-1) (c-1) where r
is the number of rows and c is equal number of columns. For example for a contingency
table with 3 rows and 4 columns is equal to (3-1)(4-1) i.e. 6

Unit 5 - Large sample test


5.1.Introduction
Sampling distribution: It is the distribution of values of means all possible random
samples of same size taken from a population.
If a sample of size ‘n’ is drawn from a normal population with mean ‘µ’ and standard
deviation ‘σ’, then sample mean is also distributed as normal with mean ‘µ’ and

standard deviation
Standard Error: It is the Standard deviation of the sampling distribution of sample
means.

Central limit theorem: If random samples of n measurements are repeatedly drawn from
a population with a finite mean µ and standard deviation σ, then the relative frequency
histogram drawn for the (repeated) sample means will tend to be distributed normally.
Approximation becomes more valid as n increases.
If a sample of size n is drawn from a normal population with mean µ and standard
deviation σ, then sample mean is also distributed as normal with mean µ and
standard deviation This proposition holds good even if the population from which the
sample is drawn is not normal provided the sample size is large (from central limit

theorem,). As x is distributed with mean µ and standard deviation the standard


normal variate is given by,

Under the null hypothesis H0 :µ = µ0

The test statistic is defined by:


The area under the normal curve between µ - 1.96 σ and µ + 1.96 σ is 0.95 Hence, in
the case of standard normal variable which has mean zero and variance 1, the area
between - 1.96 to 1.96 will be 0.95. Thus, if the hypothesis is true, Z value computed
from the sample will be between - 1.96 to 1.96 with probability of 0.95. On the other
hand, if computed value of Z lies outside the range - 1.96 to 1.96, it can be concluded
that such a sample would arise with only probability of 0.05, if the null hypothesis was
true. In this case it is inferred that Z differs significantly from the Value expected under
the hypothesis and hence the hypothesis is rejected, at 5% level of significance.

In the tests involving normal distribution, the set of values of Z outside the range - 1.96
to 1.96 constitutes the region of rejection or critical region (Fig. 1).
In the above discussion 5% level of significance was used. As mentioned earlier any
level of significance can be used. If 1% level of significance is used, the region of
rejection will be outside the range — 2.58 to 2.58 (Fig.2).

5.2. One tailed and two-tailed tests


If the null hypothesis H0 : µ = µ0 is tested against H1 : µ≠µ0 (which implies µ < µ0 or µ > µ0),
then the interest is on extreme values of Z on both tails of the distribution. In such cases the
critical region is split between two sides or tails of the distribution of the test statistic as shown in
Figs. 1 and 2 above. Tests applied for such situations are called ‘two-tailed’ tests.
If the null hypothesis H0 : µ = µ0, is tested against H1 : µ > µ0, then the interest is in the extreme
value to one side of the mean

In such cases the critical region will be to one side of the distribution as shown in Fig.3. Tests
applied to such situations are called ‘one tailed’ tests. It is to be noted that the critical value of Z
at 5% and 1% level of significance for one tailed test are 1 .645 and 2.33, whereas, these values
are 1.96 and 2.58 for two tailed tests. Discussions in the following sections will be restricted to
two-tailed tests, but the same procedure will hold good for one tailed tests also.
Test for mean of single sample
Let x1, x2 ..................xn be the values of a variable X, in a large random sample of size n from a
population with mean m and variance σ2. On the basis of this sample, the hypothesis regarding
the value µ is tested. Testing of hypothesis consists of the following steps.
5.3.Test for equality of two population means
5.4.Testing single population proportion

5.5.Test of significant difference between two proportions


Unit 6 - Small samples Tests (n <30)

6.1.Introduction

When the size of the sample is small, the distributions of various statistics are far from normality
and hence tests of hypothesis based on normal variate cannot be applied. In such cases tests of
hypothesis based or exact sampling distribution of ‘t’ and ‘F’ are applied. When applying these
tests it is assumed that the population from which the sample is drawn is normal.
The t - distribution which is popularly known as student’s t distribution is a sampling distribution
derived from the parent normal distribution. This distribution is symmetrical about the mean but
is slightly flatter than then normal distribution. Unlike the normal distribution it will be different
for different size of the sample ‘n’ or the degree of freedom (n-1). When the size of the sample is
very small < 30), the t - distribution markedly differs from normal distribution, but as n increases
t - distribution resembles more and more a normal distribution (fig.1). The t distribution has
mean zero and variance n / (n-2) for n>2. The variable t ranges theoretically from - ∞ to + ∞. The
values of ‘t’ have been tabulated for different degrees of ‘freedom at different levels of
significance (Fisher and Yates, 1963).

Test of hypothesis based on I distribution are discussed below :

6.1.1. Case study 1


Test of hypothesis based on t- distribution are discussed below
CASE 1: Test for Single Mean
Let x1, x2 ...xn are a random sample of size n drawn from a normal population with mean µ. Let
and s2 denote mean and variance of the sample.
To test the hypothesis the following test procedure is used:

This test statistic follows t -distribution with (n-1) degree of freedom.

(iii) Statistical decision

If the table value of t at 5% level of significance, then reject Ho at 5% level of significance.


Otherwise accept Ho

If the table value of t at 1% level of significance, then reject Ho at 1% level of significance.


Otherwise accept Ho

Example 1: A sample of 25 fingerlings drawn from a rearing tank showed a mean length of
75.8 mm and standard deviation of 10 mm. Is the data consistent with the claimed mean size
of 80 mm?

Answer: Let µ denote the population mean

(iii) Statistical decision


The table values of t with 24 degrees of freedom are 2.064 at 5% and 2.797 at 1% level of
significance. Since is greater the table value of t at 5% level Ho is rejected at 5% level, but as
less than the table value of t at 1% level, Ho is not rejected at 1% level of significance.

6.1.2. Case study 2


CASE 2: Testing of difference between two means (population variances assumed
equal)
Let and s1 be the mean and standard deviation of a sample of size n 1 from a normal
population with mean µ1 and let and s2 be the mean and standard deviation of another
sample of size n2 from a normal population with mean µ2. To test whether the
population means differ significantly, the following null hypothesis is set up:

(iii) Statistical decision


If > the table value of t at the specified level of significance, reject the null hypothesis
at the level, otherwise accept it.

Example 2 : Weight was recorded separately for male and female of one year old fish
of species A. The mean weights of males and females are:

Sample Mean weight


Sex Variance
size (g)
Male 9 70 25
Female 11 61 16
Test whether there is significant difference in mean weight of male and female
fish.

Answer : Let µ1 and µ2 denote the population means of male and female fishes
respectively.

(iii) Statistical decision


The table value of t with 18 degrees of freedom is 2.101 at 5% and 2.878 at 1%
level of significance. Since > the table values of t at 5% and 1% level of
significance, Ho is rejected i.e., there is significant difference between the mean
weights of males and females.

6.1.3. Case study 3


CASE 3: Test of difference between two means of paired observations (paired t – test)
When the two samples of equal size are drawn from two normal populations and these
samples are not independent, then the paired t-test is used. Dependent samples arise,
for instance, in experiments when an individual is tested first under one condition and
then under another condition, so that there will be two observations for the same
individual. Let n be the size of each of the two samples and d1, d2...................dn the
difference between the corresponding members of the sample. Let denote the mean
of differences and s the standard deviation of these differences.

(iii) Statistical decision


If > the table value at the specified significance, reject Ho at the level.

Example 3 : The following table gives the obtained by 9 students in two tests, one held
beginning of a year and the other at the year after intensive coaching. Do the data in the
students have benefited by coaching?

Student 1 2 3 4 5 6 7 8 9
Test 1 55 60 65 75 49 25 35 18 61
Test 2 63 70 70 81 54 29 32 21 70

Answer:
(i) Hypotheses

(ii) Test statistic


1 2 3 4 5 6 7 8 9 Total
Test 55 60 65 75 49 25 35 18 61
1 (X)
Test 63 70 70 81 54 29 32 21 70
2
(Y)
d= Y- 8 10 5 6 5 4 -3 3 9 47
X
d2 64 100 25 36 25 16 9 9 81 365
(iii) Statistical decision
The table value of t with 8 degrees of freedom is 2.306 at 5% level of significance
and 3.355 at 1% level of significance.
Since > the table value of t at both 5% and 1% level of significance, H o is
rejected. In other words it is concluded that the coaching has benefited the
students.
6.2. Confidence limits for population mean µ
In previous chapters, computation of confidence limits for population mean µ, based on large
samples using normal distribution was discussed. It was pointed out there that for samples with
size less than 30, ‘t’ distribution is used for computing confidence limits. The formula for
computing confidence limits using ‘t’ distribution is as follows:

Example 4: Following data refer to catch (in tons) per haul of one hour duration in a trawl
survey off a certain coast
1.2, 2.5, 1.0, 4.0, 3.0, 2.8, 0.6, 3.4, 2.5, 2.0
Compute mean catch per hour and also 95% confidence limits for catch per hour for the coast
(population) under survey.
Answer:
95% confidence limits are given by:
To calculate these confidence limits the following, computations are to be made:

Haul 1 2 3 4 5 6 7 8 9 10 Total
No.
Catch/ 1.2 2.5 1.0 4.0 3.0 2.8 0.6 3.4 2 2.5 23.0

Hour(x)
x2 1.44 6.25 1.0 16.0 9.0 7.84 0.36 11.56 4.0 6.25 63.7
Exercise for practice:
I. Samples of eleven and fifteen animals were fed on different diets A and B respectively.
The gain in weight for the individual animals for the same period was as follows:

Diet A(in gm): 20 25 30 23 29 19 30 12 19 27 26


Diet A(in gm): 39 29 19 17 41 28 23 27 31 28 31 18 16 24 26

II. Length measurements of sampled mackerel made on the same day in two landing
centres are given bellow:
Landing centre A: 24 21 20 19 17 18 15 13 20

Landing centre B: 21 22 14 18 16 19 20 22 21
Does the data support that mean length of mackerel in both the landing centres are same.

Unit 7 - Chi-square distribution

7.1.Introduction to Chi-square ( ) distribution


Theoretically, Chi-square ( ) distribution can be defined as the sum of squares of
independent normal variates. If X1, X2 ………….Xn are n independent standard normal
variates, then sum of squares of these variates X12+X22 +…………………………+Xn2
follows the distribution with n degrees of freedom. Alternatively if a sample of size n,
is drawn from a normal population with variance σ2, the quantity (n-1) s2 / σ2 follows
distribution with (n-1) degrees of freedom where s2 is the sample variance.

The shape of distribution depends on the degrees of freedom which is also its mean
(Fig.1). When n is small, the distribution is markedly different from normal distribution
but as n increases the shape of the curve becomes more and more symmetrical and for
n > 30, it can be approximated by a normal distribution. The values of have been
tabulated for different degrees of freedom at different levels of probability. (Fisher and
Yates, 1963). The is always greater than or equal to zero i.e. ≥ 0.
Most data on biological investigations can be classified either as quantitative or
qualitative (attribute) data. The statistical procedures discussed so far apply
mostly to quantitative data. There are many instances in fisheries research,
wherein attribute data describe the phenomenon under investigations more
adequately than quantitative data. The chi-square test based on distribution is
commonly used for analysis of attribute data.

7.2. Test for fixed-ratio hypothesis


Many investigations are carried out to verify empirically some biological phenomena that are
expected to occur under some given assumptions. In common carps normal pigmentation is due
to a dominant gene B — and its recessive allele bb produces blue pigmentation. F1 generation
will have all individuals (Bb) with normal pigmentation. When F1’s are crossed to obtain F2
generation there will be common carps with normal and blue pigmentation in the ratio of 3: 1.
Whether this hypothesis of 3: 1 ratio is substantiated by the actual observed data can be
ascertained by - test. This test can be applied to test any fixed ratio hypothesis provided the
expected ratio is specified before the investigation commences.
If Oi refers to observed frequency and Ei refers to the expected frequency based on the expected
ratio hypothesis, then is computed as follows:

Where n is the total number of observations and k is the number of classes. The in (1) has k-1
degrees of freedom. In this test the expected frequency of each class should be more than 5. If
any such frequency is small adjacent classes may be grouped, so that the expected frequency is
more than 5.
If the calculated value of is greater than the table value of with (k-1) df, at specified level of
significance the null hypothesis of specified ratio is rejected.

Example : A sample of 500 fish observed for determining the sex ratio, indicated that
230 were male and 270 female. Do the observed data fit the expected ratio of 1: 1?
(i) Hypothesis
Ho :The observed data fit he ratio 1:1
H1 :The observed data does not fit the ratio 1:1.
(ii) Test statistic
On the basis of this hypothesis of 1:1 ratio, 250 fish are expected in male and female
classes, is calculated as follows:
Frequency
Sex Oi2 Oi2/Ei
Observed (Oi) Expected (Ei)
Male 230 250 52,900 211.60
Female 270 250 72,900 291.60
Total 500 500 503.20

(iii) Statistical decision


The table value of with 1 df at 5% level Significance is 3.841. As computed is less
than the table value of , the hypothesis is not rejected.

7.3. -Test for Goodness of fit for probability distributions


Another important application of is in testing if a set of quantitative data follows a specific
probability distribution. In this test actual frequency in each category (or class interval) are
compared with the frequencies that could be theoretically expected if the data followed the
hypothesized probability distribution. To perform this test following steps are followed:
(i) Hypothesize the probability distribution to be fitted.
(ii) Values of each parameter of selected probability distribution is estimated from the given data
if not specified.
iii) Theoretical frequencies for each class are estimated based on the hypothesized probability
distribution.
(iv) The following chi-square test statistic is computed

It has (k-1) df, where k is the number of classes.


(v) If the expected frequency of any class is less than 5, the adjacent classes can be grouped to
form a class, so that expected frequency is more than 5.
(vi) If the expected frequencies are calculated on the basis of certain parameters estimated from
data, the degrees of freedom for is not (k-1) but is decreased by the number of parameters
estimated.
(vii) If computed in step (iv) is greater than the tabular value of with (k-1) df at specified
level of significance, the null hypothesis that selected probability distribution is a good fit to the
given data is rejected.

Example : Test whether the data on number of animals per square of a particular species of
plankton given in example 5 of chapter 6 follows Poisson distribution.
Answer :
(i) Hypotheses
H0 :Number of animals per square of a particular species of plankton follows Poisson
distribution
H1 :Number of animals per square of particular species of plankton does not follow Poison
distribution
(ii) Test Statistics
Expected frequencies using Poisson probability distribution have already been computed in
example 5 of chapter 6. Hence based on observed and expected frequencies can be computed
as outlined below:
x Oi Ei Oi2/ Ei
0 30 33.29 27.04
1 42 36.62 48.17
2 18 20.14 16.09
3 8 7.38 8.67
4 2 2.03 1.97
Total 100 101.94

=101.94 – 100 = 1.94


As the mean m of the distribution is estimated from the sample, number of degrees of freedom = k — 1 -
1 = 3.
(iii) Statistical decision
The table value of with 3 df at 5% level of significance is 7.815. Since calculated in (ii) above is
less than the table value of the null hypothesis is not rejected.

7.4. - test for independence of attributes in 2 x 2 contingency table


Suppose that an attribute data of size n is classified according to two attributes, say, A and B
and the attribute A is further subdivided into two classes A1 and A2 and the attribute B into B1,
and B2. Such attribute data can be presented in the form of a table called 2 x 2 contingency
table as shown below:
Table 2: 2x2 contingency table
B\A A1 A2 Total

B1 a b a+b
B2 c d c+d

Total a+c b+d N= a+b+c+d

It may be of interest to test the independence of attributes A and B.


H0 : The two attributes A and B are independent.
H1 : The two attributes A and B are dependent.
It is tested by the test. A simple formula for computing of a 2 x 2 contingency table
is given by,

7.5. Fisher’s correction for continuity


Where a, b, c and d are cell frequencies of 2 x 2 contingency table and N is the total
frequency. This has 1 degree of freedom. If, the expected cell frequencies are large,
the discrete distribution of probabilities of all frequencies approximate to normal
distribution. This approximation holds good fairly well when the degrees of freedom are
more than 1 and the expected cell frequency in the various classes is not small. As the
degrees of freedom of statistic of 2 x 2 contingency table is 1, approximation in this
case will not be satisfactory and leads to over estimation of significance. This is
corrected by the method suggested by Yates which is known as ‘Yates correction’. The
correction consists of adding ½ to the observed minimum frequency and adjusting the
other cell frequency for the observed marginal totals and then computing the .
Formula for using the Yates correction in a 2x2 contingency table is given by,

This correction is suitable when the expected frequency of classes is less than 5, but
estimation with correction can do no harm even when the frequencies are large. Hence
it is always better to use the correction as a matter of routine.

Example: In a series of experiments to test whether advanced stages of MyxcoboIus


infection is cured by ‘me treatment, the following observations were found:

Not cured Cured Total


Lime treated 86 14 100
Untreated 88 12 100
(control)
Total 174 26 200

Test whether lime has any effect in curing the infection


Answer:
(i) Hypotheses
Ho : There is no association between lime treatment and curing of infection.
H1 : There is association between lime treatment and curing of infection
(ii) Test statistics

Where a =86, b=14, c = 88, d = 12

(iii) Statistical decision


Since calculated (with and without Yates correction) is less than the table value (3.84
at 5%, 6.64 at 1%), H0 is not rejected.

7.6.Computation of in r x c contingency table


The r x c contingency table is an extension of 2 x 2 contingency table in which the data
are classified into ‘r’ rows and ‘c’ columns (table 3). In this table the frequencies which
occupy cells of the table are called ‘cell frequencies’ whereas row and column totals are
called the ‘marginal frequencies;
r x c contingency table
A\B B1 B2 ... Bj ... Bc Total
A1 O11 O12 ... O1j ... O1c A1
A2 O21 O22 ... O2j ... O2c A2
.

. ... ... ... ... ... ... ...

.
Ai Oi1 Oi2 ... Oij ... Oic Ai
.

. ... ... ... ... ... ... ...

.
Ar Or1 Or2 ... Orj ... Orc Ar
Total (B1) (B2) .. Bj .. BC N

As the table consists of ‘r’ rows and ‘c’ columns, there will be (r x c) observed
frequencies, one in each cell. Corresponding to each observed frequency, there is
expected frequency, computed based on certain hypothesis. Under the null hypothesis
of no relationship or of independence between the attributes, expected frequency of
each cell is computed by multiplying totals of the row and column to which the cell
belongs divided by the total number of observations. For instance, the expected
frequency of the cell in 1st row and 2nd column is obtained by multiplying the 1st row
total (A1) with the 2nd column total (B2) and then dividing by the total number of
observations, ‘n’. After calculating the expected frequencies for each cell, is
computed using the formula,

Which has (r-1) (c-1) degrees of freedom.

Example: In a fish tagging experiment, the length frequency of tagged fishes and
recoveries were as under. Test whether the length distributions can be accepted as
same?

Length group (cm)


10-20 20-30 30-40 40-50 50-60 Total
Fishes
108 140 256 358 111 1000
tagged
Fishes
9 15 28 40 8 100
recovered
Answer:
(i) Hypotheses
Ho : Length distribution of tagged and recovered fishes is the same.
Hi : Length distribution of tagged and recovered fishes is not the same.
(ii) Test statistics
Length group (cm)
10-20 20-30 30-40 40-50 50-60 Total
Fishes
108 140 256 358 111 1000
tagged
Fishes
9 15 28 40 8 100
recovered
Total 117 155 284 425 119 1100

To compute statistic the following computations are to be made;


= 1101.40 - 1100 = 1.40
(iii) Statistical decision
Table value of with 4 df at 5% level of significance is 9.488. As the calculated value of
is less than table value of , the null hypothesis is not rejected.

7.7.Test of hypothesis about a population variance


Test of hypothesis about a population variance
Let x1, x2 .........................xn be the values of a variable in a random sample of size n
drawn from a normal population with variance σ2.
The null hypothesis to be tested is, Ho : σ2 = σo2
where σo2 is a specified value.
Test statistic used is
Example 5 : A market survey conducted on 50 households, indicated that the average
expenditure of households is Rs.40 per week on purchase of fish with standard
deviation of Rs.22. Can this data be considered as a sample from a population with
variance of 400 Rs.2, at 5% level of significance?
Answer:
(i) Hypotheses
Ho: σ2 = 400 ; H1 : σ2 ≠ 400
(ii) Test statistic

(iii) Statistical decision


Reject Ho if, > 0.025 or < 0.975
Otherwise do not reject it
For 49 df, 0.025 = 70.222 and 0.975 = 31.555
As lies between 31.555 and 70.222, Ho is not rejected.

Exercise for practice:


I. The following table gives sex-wise eating habit of fish in a random sample of 500
people collected from a city. Does the data support that eating habit of fish depend on
sex?

Number of people
Sex
Eating fish Not Eating fish
Male 370 150
Female 230 250
Total 600 400
II. A survey of 162 children having one parent with blood group M and another with
blood group N revealed that 28.4% children have blood group M 42% have blood group
MN and the remaining have the blood group N.Test the validity of the genetic law that
the proportion of M: MN: N is 1:2:1

Unit 8 - F-Distribution

8.1.Introduction
In the t-tests on paired and unpaired samples we had to assume that the two samples came from
the same Normal Distribution. We were testing that the means were the same but still had to
assume that standard deviation were the same. We can test whether this assumption is correct by
using F- test is due to Snedecor.

If the standard deviations are the same then we would expect sample standard deviations to be
similar i.e. s1=s2 or more precisely,
This can be written as σ12/ σ22 =1
That is we can test for the closeness of this ratio to 1
The Variance ratio or F-Test formalises these ideas:
The ratio of variances (s12/ s22) of independent samples taken from a normal population follows a
distribution called F-distribution with two degrees of freedom one for numerator and the other
for denominator.
8.2. F-distribution
If s12 is the variance of a sample of size n2 and s22 is the variance of another
independent sample of n1 taken from the same population then ratio of variances of
these samples i.e. s12/ s22 follows a distribution called F-distribution with (n1-1) and (n2-
1) degrees of freedom.

Note:

 F has two sets of degrees of freedom-one for numerator and another for
denominator.
 F is a positively skewed distribution.
 Shape of F distribution depends upon the two degrees of freedom.
 Degrees of freedom are the parameters of F distribution.

8.4.Testing for the equality of variances

 H0 : σ12= σ22 (No significant difference between Variances)


 H1: σ12≠ σ22 (Significant difference between Variances)
 Fix α =0.05 Say
 Test statistic: F= s12/ s22 where s12 is the larger sample variance
 Decision Rule: Reject H0 (i.e. No significant difference between Variances) if
computed test statistic value is more than table F value for (n1-1); (n2-1) degrees
of freedom otherwise accept H0.
Example 1
A random sample of sample of 25 mackerels taken gave standard deviation of length
3.8 cm, while a random sample of 35 oil sardine taken gave standard deviation of length
4.5 cm.Does this support equality of length variances of both the species at 5% of
significance?
Solution
F test statistic is used to test for equality of variances.
Hence we have to convert the given values standard deviation into variances. Here we
have larger variance is for oil sardine. Hence take s1= 4.5, n1=35, s2= 3.8, n2=25.

 H0: σ12= σ22 (No significant difference between Variances)


 H1: σ12≠ σ22 (Significant difference between Variances)
 Fix α =0.05 Say
 Test statistic: F = s12/ s22 where s12 is the larger sample variance

= 4.52/3.82
F(cal) = 1.40
F(tab) at (n1-1, n2-1)= (34,24)df at 5%=1.79 (approximately)

 Decision Rule: Since F(cal)=1.40< F(tab)=1.79, Null hypothesis is accepted.

8.4.Testing for the equality of variances


Example 2
To compare the variations of fish can weights manufactured by two plants, a random
sample of 32 fish cans taken from the first plant gave variance of 25gm 2 and a random
sample of 42 fish cans taken from the second plant gave variance of 40gm 2. Does the
values support the equality of variances at 5% of significance?
Solution
F test statistic is used to test for equality of variances.
Here we have larger variance for the second plant . Hence take s12= 40, n1=40, s22=25,
n2=32.

 H0 : σ12= σ22 (No significant difference between Variances)


 H1 : σ12≠ σ22 (Significant difference between Variances)
 Fix α =0.05 Say
 Test statistic: F= s12/ s22 where s12 is the larger sample variance

= 40/25
F(cal) = 1.60
F(tab) at (n1-1, n2-1)= (39,24)df at 5% =1.79 (approximately)

 Decision Rule: Since F(cal)=1.40<F(tab)=1.79 ,Null hypothesis is accepted.

8.5. Exercise for practice


1. Can the following two samples be considered to have the same underlying variance?
2. Test for the equality standard deviation of the weight of the two species from the following
measurements of sampled data.

Unit 9 - Correlation and regression

9.1. Introduction to Correlation Analysis


The statistical methods discussed so far are primarily intended to describe a single variable i.e.,
univariate populations. In this chapter the techniques that are useful in studying the relationships
that exist when the data on two or more variables is available, are discussed.

If on the same individual, data on two variables say X and Y are listed, it is called a bivariate
population. In this bivariate population, for every value of X, there is a corresponding value of Y.
By treating these variables X and Y separately, measures of central tendency, dispersion etc., can
be worked out. In addition to these measures it may be of interest to study the strength of
relationship existing between the variables and the nature of their relationship. The study of the
former aspect is referred to as ‘correlation’ and the latter as regression analysis.

9.2. Scatter diagram


If X and V denote the two variables under study, the scatter diagram is obtained by plotting the.
pairs of values of X and Y taking variables on Cartesian Co. ordinates. This diagram gives an
indication of whether the variables are related and if so, the possible type of line or estimating
equation which can describe the relationship.
If the scatter of points indicates that a line can better fit the data, then the relationship between
the variables is said to be linear. Scatter diagrams in Fig. (1) and Fig. (1) are examples for linear
relationship between X and Y. In Fig.1, X tends to increase as Y increases; the relationship
between the variables is said to be direct and linear. In Fig.2, X decreases as Y increases, the
relationship between the variables is said to be inverse and linear.
If the scatter f points indicates that a curve car better fit the data, then the relationship between
the variables is said to be non-linear or curvilinear. Some curvilinear relationships are shown in
Fig.3 and Fig.4
If the scatter of points is as shown in Fig.5, then there is little or no relationship between the
variables.
9.3. Measure of Simple Correlation
It is a statistical tool to study the degree of association or relationship existing between two
variables, when the relationship is linear or approximately linear. The degree of relationship is
quantified by a coefficient called the ‘Karl Pearson’s product moment correlation coefficient or
simply the ‘correlation coefficient’. It is denoted by r. The working formula for r is given by,

In the above expression, X and Y denote the measurements on variables X and Y and n is the
number of pairs of observations i.e. the sample size.

Properties of the correlation coefficient

 It is a pure number without units or dimensions.


 It lies between -1 and 1 i.e., -1 ≤ r ≤ 1.
 The correlation coefficient is independent of the origin and the scale of measurement of
the variables.

The variables are said to be positively correlated if ‘r’ is positive and negatively correlated if ‘r’
is negative. Positive correlation indicates that two variables are moving in the same direction,
i.e., as one increases the other increases or if one decreases the other decreases. Negative
correlation indicates that the two variables are moving in opposite direction i.e., as one increases
the other decreases

Examples

 Length and weight of juveniles of fish, Income and expenditure are some of the examples
for positive correlation
 Rate of infection and yield, demand and supply are the some of the examples for negative
correlation
 Growth & demand for fish, size of the shoe and number of intelligent boys / girls are
some of the example for no correlation.

When r = +1 there exists a strict linear relationship and the correlation between the variables is
said to be perfectly positive.
When r = -1 the relationship is linear and correlation between the variables is perfectly negative.
The correlation coefficient equal to one (either positive or negative) indicates perfect correlation
between the variables. Perfect correlation rarely occurs in biological data though values as high
as 0.99 have been obtained in some cases. The closer the value of the coefficient to one, the
greater is the intensity or the degree of association between the variables. Values of r near zero
may arise when there is no relationship or when there is a real relationship but it is not linear.
9.3.1.Example 1
The total length and standard lengths of 15 fishes of a particular species were measured. Work
out the coefficient of correlation for the data given below:

Answer:

Then the Correlation coefficient is given by,


Interpretation
Since r = 0.99, we infer that there is high degree of positive correlation between total length (X)
and standard lengths (Y) of 15 fishes of a particular species.

9.4.Testing the Significance of the correlation coefficient


Let r be the observed correlation coefficient in a sample of n pairs of observations from a
bivariate normal population. To test the hypothesis Ho: = 0, i.e., population correlation
coefficient is zero, the following test procedure is used:
(i) Hypothesis
Ho : =0 H1: ≠ 0
(ii) Test statistic
Compute :

Which is distributed as t with (n-2) df.


(iii) Statistical decision
If the calculated value of t is greater than the table value of t with n-2 degrees of freedom at the
desired level of significance, the correlation between the variables is significant.
However, it is to be noted that the significance of r is not an indication of the strength of
relationship. It is simply a test to see whether is equal to zero or not. The degree of the
relationship between two variables can be measured by the square of the correlation coefficient
r (which is called the coefficient of determination). Unless r2 very high, one variable should not
be used to predict the other.

9.4.1 Example 2
The correlation between length and weight ‘or a particular fish species is observed to be
0.7 from a sample of 18 specimens.ls it significant?
Answer: Let denote the population correlation coefficient between length and weight
of fish.
(i) Hypothesis
H0 : = 0 ; H1 : ≠ 0
(ii) Test statistic

=3.92
Table values of t are t16 (5%) = 2.12, t16 (1%) = 2.92
(iii) Statistical decision :
Since the calculated value of t is greater than the table value oft at 5% and 1% level of
significance, reject H0. Hence, the correlation coefficient is significant.
Note: It is however, not necessary to carry out the‘t’ test described above for testing the
significance of the correlation coefficient as readymade table of critical values of r for
different degrees of freedom at 5% and 1% levels of significance is available (Fisher
and Yates 1963). Compare the calculated value of r with the critical value of r from the
table. If the calculated value of r is higher than the critical value, then correlation is
significant.
9.5 Introduction to Simple Linear Regression
If two variables are found to be highly correlated then a more useful approach would be to study
the nature of their relationship. Regression analysis achieves this by formulating statistical
models which can best describe these relationships. These models enable prediction of the
value of one variable, called the dependent variable from the known values of the other
variable(s). It differs from correlation in that regression estimates the nature of relationship
where as the correlation coefficient estimates the degree or intensity of relationship. Further, it is
necessary to designate one of the variables as dependent and the other as independent in the
case of regression analysis which is not necessary in correlation analysis.
Simple linear regression deals with the study of near relationships involving two variables,
whereas, the relationships among more than two variables are studied of multiple regression
techniques.

9.6 Estimation of parameters of regression equation


Scatter diagram gives some idea of the nature of relationship existing between the variables. If it
indicates that the relationship is linear in nature, next step would be to develop a statistical model and
proceed to estimate the underlying relationship. It is assumed that linear relationship of the form,
Y = a+bX+e .................................... (1)
In expression (1):
‘e’ is a random variable (random error factor) assumed to be normally distributed with mean zero and
variance σ2,i.e
’a’ and ‘b’ are constants (parameters).
In this model it is assumed that each Yi is normally distributed with mean a+bXi and constant variance σ2.
In the classical regression model it is further assumed that values of the independent variable X are fixed
or are pre-selected by the researcher and the variable X is measured without error.
Fig (6): Linear regression of Y on X

Fitting linear relationship of the form (1) is equivalent to estimating the constants a and b for the
observed data. The best method that is used of estimation of ‘a’ and ‘b’ is the method of ‘least squares’.
In a popular way it only means that a line is found to which the total of squares of all distances from
different points is minimum i.e. sum of e2 is minimum. In other words the values of a and b which
minimize,

In the above expression n stands for the number of pairs of observations.


Estimates of parameters a and b which minimize (2) are obtained by the following formula:

Estimated values of these constants are substituted in the equation Y = a+bX to get the regression
equation. From this equation the value of Y can be estimated for a given value of X.

9.6.1 Special names of the parameters ‘a’ and ‘b’ and expressions for their variances
There are special names for the parameters ‘a’ and ‘b’. The parameter ‘a’ is called the Y
intercept. It is the value Y assumes when X = 0. The parameter ‘b’ is called the regression
coefficient and gives the slope of the regression line, i.e., it shows how steep the line is. The
regression coefficient indicates the rate of change in the dependent variable(Y) per unit change
in the independent variable(X).
The variance of the estimates b and a are respectively given by:
where sy and sx, are standard deviations of y and x respectively.
9.6.2 Variance about the regression line (Deviation from regression)
The assumption behind the standard linear regression is that each Yi is normally distributed with
mean value a+b Xi, and with a constant variance σ2 which is not dependent on the value of Xi.
The formula for estimate of this variance is given by

This forms the basis for an estimate of error in fitting the line. However, convenient formula to
work out this variance is given by,

9.6.3 Different Regression Lines


Prior to this topic it was mentioned that independent variable X is fixed or is not a
random variable. If both X and Y are random variables and are open to choice as to
which affects which then the following regression lines may be conceived :
(i) Regression equation of Y on X
If Y is considered as dependent variable, then the regression equation of Y on X is
given by,
Y = a+b X
The regression coefficient b is called the regression coefficient of Y on X and is usually
denoted by x. In this equation a and b are so estimated as to minimize the residual
variation (deviations from regression) of Y i.e. ∑(Yi-a-bXi)2 is minimized.
(ii) Regression equation of X on Y
If X is considered as dependent variable then the regression equation is given by
X = a1+ b1 Y
The regression coefficient b1 is called the regression coefficient of X on Y and is usually
denoted by bxy. In this equation a and b are so estimated as to minimize the residual
variation of X i.e. ∑(Xi-a-bYi)2 is minimized. The values of ‘a’ and ‘b’ obtained in (i) and
(ii) will usually be different.
Functional Regression
One way to overcome the problem of choosing the independent variable when both
variables are random variables is to use ‘functional regression’ given by Ricker (1975).
According to this method the slope is estimated using:
byx = sy/sx if r > O
bxy= -sx/sy if r < O
and the intercept is estimated using:
For the model

For the Model


9.7 Properties of regression lines

 The regression lines intersect at point .


 If the variables are perfectly correlated, the regression lines coincide.
 If the variables are not correlated the regression lines of Y on X and X on Y are
perpendicular to each other.

Relation between correlation and regression coefficients


If byx is the regression coefficient in the regression equation of Y on X and b xy is the
regression coefficient in the regression equation of X on Y, then the correlation
coefficient r is the square root of the product of byx and bxy.
i.e.,

Test of significance of linearity of regression (significance of regression


coefficient)
The significance of the linearity of regression is tested by one of the following methods:

 The method of analysis of variance


 t- test

Let β denote the population regression coefficient. The hypothesis to be tested is


1. Ho : β = 0 (There is no significant linear regression)
2. H1 : β ≠ 0 (There is significant linear regression)
3. α : 0.05 Say

4. Test statistic is t =

5. Reject Ho if calculated t is more than table t value at (n-2) degrees of freedom,


otherwise accept Ho of no significant correlation
Example: Correlation computed between standard length and total length of 38
randomly selected oil sardines is found to be 0.82.Test for significance.
9.8 Length-weight relationship in Fishes
In fishes the relationship between body weight (W) and body length (L) has been empirically observed to
be of the form
W=aLb ................................... (1)
This equation is in nonlinear form. The parameters/constants ‘a’ and ‘b’ are almost universally estimated
by researchers by transforming the above equation to logarithmic form and applying the least squares
technique. Thus the equation actually used is
Log W =log a + b log L
The above method assumes the following multiplicative error model:
W = a Lb’ ..................................... (2)
Where a and b are constants and e’ is a random error factor.
Taking logarithm to the base ‘e’ on both sides of (2) gives rise to
In W = In a + b I n L + In e’
I.e. Y= A + BX + E .................................... (3)
where Y = lnW; X = ln L; A= ln a, b=B, E= In e’
Expression (3) is in the linear form. If it is assumed that E is distributed normally with mean zero and
variance o2, then the estimates of A and B can be obtained by the method of least squares.

In the above expression Y = In W and X =In L, and denote arithmetic means of Y and X values
respectively. The B value given and estimate of b, whereas conventionally ‘a’ is estimates as exp (A). This
method however gives biased estimate of ‘a’. To compensate for the bias the ‘a’ value obtained is
multiplies by the correction factor exp (S2/2), where S2 is an estimate of variance of deviations from
regression. Hence, corrected a = exp (A+S2/2).
Note: If logarithm to the base 10 (common logarithm) are used then, a = antilog (A+S2/2)
9.9 Applications of length-weight relationship of fishes

 It is useful in estimation weight of fish for a given length

As length of fish can be measured more easily and accurately than weight in landing centers as
well as on the board of the vessels in the sea, weight would readily be estimated from the
predetermined length-weight relationship.

 It is useful in determining condition factor

In order to compare weight and length in a particular sample or individual condition factors are
employed. Fulton’s condition factor (K) is calculated as,

Where W and L are the observed total weight and length of a fish. It is the value of ‘a’ in length-
weight relationship, W = aLb, when b=3. If the fish is heavier, at a given length the larger is the
factor K, implying better is the condition of fish. K greater than 1, indicates general well being of
the fish is good. Fulton’s condition factor is suitable for comparing differences related to sex,
season or place of capture. Even when b differed from 3, Fulton’s condition factor may be used,
if fish are approximately of the same length. If the length range is large, the following formula is
used:

Alternatively, the condition factor is computed as the ratio observed weight to estimated weight.

K =
Where is the estimated weight obtained from length weight relationship, W = aLb
9.10 Fitting of length-weight relationship
Examples:Total length (cm) and weight (g) recorded on a sample of 12 fish are given
below:

(i) Fit length-weight relationship of the type W = aLb, where W is weight and L is length
of fish
(ii) Test whether b differed significantly differs from 3.
Solution:

(i) Length-weight relationship

= 1.2898 = 1.5310
Then = 1.5310-4.8917= -3.3607

Sy2= 0.12015; Sx2 =0.00827


Then Variance of deviations from regression is given by:

S2 =

=
= 0.0011
a= Antilog (A+S2/2)
= Antilog [-3.3607+ (0.0011) /2]
= Antilog ( -3.35015) = 0.0004
The length-weight relationship is therefore given by
W = 0.0004 L 3.7944
(ii)To test whether the sample regression coefficient (b=3.7944) comes from a
population with the regression coefficient β=3, the following null hypothesis is set up.
Hypothesis
Ho: β=3; H1:β≠3
The test statistic used is,

This is distributed as t with n-2 df.

Sb2= [(Sy /Sx)2– b2]

=0.0122
Hence Sb = = 0.1105

Hence
(iii) Statistical decision
The table values of t at 5% and 1% level of significance are 2.228 and 3.169
respectively. As t calculated is 7.2 which are more than the table values of t at both 5%
and 1% level of significance, the null hypothesis is rejected.

Unit 10 - Estimation of Total Marine Fish Landings

10.1.Introduction
India has a long coast line of about 8118 km and there are about 1400 landing centres
scattered along the coast. The sampling design developed and practiced by the Central
Marine Fisheries Research Institute (CMFRI) Kochi provides the estimate for total
marine fish landings India. The sampling design adopted for this purpose is ‘stratified
multistage random sampling’ with the stratification being done over space and time.
Each maritime state is divided into several zones on the basis of geographical
consideration and fishing practices.

10.2. Sampling for first stage units


A total of nine fish landing centres are selected at random from each zone for recording fish landings.
For time, month is divided into 3 groups each consisting of 10 consecutive days. From the first ten days
group, a day is selected randomly such that it falls within the first five days. Then 6 consecutive days
from the selected day onwards are considered and these 6 days are grouped into 3 clusters of 2
consecutive days each. From the 2nd and 3rd group of 10 days, 3 clusters of two days each are chosen
systematically with a sampling interval of 10 days. To illustrate, suppose that the 4th date (day) was
selected from the group of first 10 days. The 6 consecutive days from the selected day will be 4th, 5th, 6th,
7th, 8th, and 9th .These days are then grouped into 3 clusters of 2 consecutive days, i.e., dates 4th and 5th
will from first cluster while 6th and 7th from the second cluster and 8th and 9th will be form the other two
clusters. From the 2nd group of 10 days, 6 days are systematically selected with a sampling interval of 10
days from the first date selected from the group of first ten days. Thus 6 days in the 2nd groups will be
14th, 15th,16th, 17th,18th and 19th forming 3 clusters of dates 14th and 15th, 16th and 17th and 18th and 19th.
From the 3rd group, 6 days are selected with the sampling interval of 10 days from the first day selected
in the 2nd group, i.e., the dates will be 24th, 25th, 26th,27th, 28th and 29th whose clusters are 24th and 25th,
26th and 27th, 28th and 29th. Thus there are 9 clusters of two days each in a month. These 9 clusters are
allotted to the 9 selected landing centres. On the first day of observation data are collected from 12 to
18 hours and next day 6 to 12 hours. The data on night landings are collected by enquiry, covering the
period from 18 hours of first day to 6 hours of the next day. Thus 24 hours period is covered for a
landing centre. This forms the landing centre day and is the first stage sampling unit.
10.3.Sampling for second stage units
On the day of observation at the selected, landing centre, if the number of fishing units that land their
catches is 10 or less, then, the data on all the units is collected. If the number of fishing units exceeds 10,
a sample of boats is selected in a predetermined manner. Thus fishing units form the 2nd stage units on
which data on species-wise catch, effort, craft, and gear etc., are recorded.
10.4.Sampling for third stage units
At the 3rd stage, samples of commercially important species are taken from the selected second stage
units for biological observations.
10.5.Estimation of total landings
Based on the data collected from the selected fishing units, the total landings for the landing centre day
are estimated. From these the monthly estimates for each year on a zonal, district and state basis are
worked out together with the corresponding sampling errors.
10.6.Estimation of inland fish catch
Diverse nature of inland fisheries resources has been causing of problem in evolving standard
methodology for estimation of fish catch. Important resources of inland fisheries include rivers and
canals, reservoirs, ponds and tanks, estuaries, beels, oxbow takes and derelict water bodies.
Methodology evolved for one resource may not apply for the other. Fish landing places are also
scattered and sometimes it is difficult to access them. Harvesting operations in culture fisheries are
staggered and the fish catch that reaches markets is after taking care of house hold requirements. An
allowance for poaching also needs to be considered while estimating fish catch.

Realizing the need for developing a uniform and standard methodology for estimation of inland fisheries
resources and catch, a pilot investigation was launched as early as in 1955-56 by ICAR in two districts of
erstwhile Hyderabad state. Later on, the Government transferred the work from ICAR to the Directorate
of Nation Sample Survey (NSSO) in April, 1956. In September, 1958 the Directorate of the National
Sample Survey took up the survey work in Orissa to evolve suitable sampling techniques for estimation
of fish production.

By the end of 1958, basic information on various inland fisheries resources and their relative importance
fish practices etc., were available which later formed the basis of the pilot survey undertaken by the
NSSO in 1962-63, in 3 districts of Orissa viz. Cuttach, Sambalpur and Mayurbhanj. Indian statistical
Institute, Calcutta made an attempt for evolving sampling methodology for inland fisheries during 1960-
61. Field problems came to light during the study and no estimation was attempted.

The Central Inland Fisheries Research Institute (CIFRI), Barrackpore made an attempt the area and catch
from ponds in the district of Hoogly. West Bengal during 1962-63, which was not successful due to some
administrative difficulties. The NSSO conducted a survey in 1973-75 covering 3 districts one each in
West Bengal, Tamil Nadu and Andhra Pradesh for estimating the catch from impounded water as well as
riverine resources by enquiry method. The estimates worked out were not satisfactory particularly form
riverine resources.

Indian Agricultural Statistics Research Institute (IASRI). New Delhi and CIFRI, Barrackpore carried out a
pilot survey during 1978-81 in one district of West Bengal. The data were collected both by enquiry and
by physical observations. The study covered only ponds in the district of 24 Paragans in West Bengal.
The catch estimate of other important resources viz., estuaries, rivers, brackish water impoundments
beels etc. could not be carried out due to limited manpower.

As scientifically designed method for collection and estimation of inland fisheries statistics did not
emerge in spite of all these attempts, a centrally sponsored scheme was launched in 1984 in 8 states to
evolve standardization methodology for collection of inland fisheries statistics in the country and its
implementation was entrusted to CIFRI, Barrackpore. The resources assessment survey work and catch
assessment survey work have been completed in 158 and 56 districts respectively till 1998-99 under this
scheme. The scheme has enabled preparations of a uniform and sound data collection methodology on
the basis of sample surveys conducted in various states. Estimation procedures have also been
formulated for different ecological environments in inland fisheries. The CIFRI Barrackpore has brought
out a bulletin number 58 (revised) in 1991 on methodology for collection and estimation of inland
fisheries statistics in India, to provide guidelines on collection of data and estimation procedures with
associated degree of reliability at national level. Inspite of all these efforts, standardization of
methodologies for estimation of catch from diverse inland aquatic resources and establishing
mechanism for regular collection and dissemination of data by the states and union territories are yet to
take place.
The Central Inland Fisheries Research Institute (CIFRI), Barrackpore made an attempt the area and catch
from ponds in the district of Hoogly. West Bengal during 1962-63, which was not successful due to some
administrative difficulties. The NSSO conducted a survey in 1973-75 covering 3 districts one each in
West Bengal, Tamil Nadu and Andhra Pradesh for estimating the catch from impounded water as well as
riverine resources by enquiry method. The estimates worked out were not satisfactory particularly form
riverine resources.

Indian Agricultural Statistics Research Institute (IASRI). New Delhi and CIFRI, Barrackpore carried out a
pilot survey during 1978-81 in one district of West Bengal. The data were collected both by enquiry and
by physical observations. The study covered only ponds in the district of 24 Paragans in West Bengal.
The catch estimate of other important resources viz., estuaries, rivers, brackish water impoundments
beels etc. could not be carried out due to limited manpower.
As scientifically designed method for collection and estimation of inland fisheries statistics did not
emerge in spite of all these attempts, a centrally sponsored scheme was launched in 1984 in 8 states to
evolve standardization methodology for collection of inland fisheries statistics in the country and its
implementation was entrusted to CIFRI, Barrackpore. The resources assessment survey work and catch
assessment survey work have been completed in 158 and 56 districts respectively till 1998-99 under this
scheme. The scheme has enabled preparations of a uniform and sound data collection methodology on
the basis of sample surveys conducted in various states. Estimation procedures have also been
formulated for different ecological environments in inland fisheries. The CIFRI Barrackpore has brought
out a bulletin number 58 (revised) in 1991 on methodology for collection and estimation of inland
fisheries statistics in India, to provide guidelines on collection of data and estimation procedures with
associated degree of reliability at national level. Inspite of all these efforts, standardization of
methodologies for estimation of catch from diverse inland aquatic resources and establishing
mechanism for regular collection and dissemination of data by the states and union territories are yet to
take place.

You might also like