Bbacam 106 Notes PDF

Chanderprabhu Jain College of Higher Studies
&
School of Law
An ISO 9001:2008 Certified Quality Institute
(Recognized by Govt. of NCT of Delhi, Affiliated to GGS Indraprastha University, Delhi)
Class : BBAcam
Paper Code : 106
Subject : Business statistics and research methodology
Unit 1
Statistics: Definition, Importance & Limitation.
Definition of statistics:-“By statistics we mean aggregates of facts affected to a
marked extent by a multiplicity of causesnumerically expressed, enumerated or
estimated according to reasonable standards of accuracy collected in a systematic
manner for a pre-determined purpose and placed in relation to each other”
Importance of Statistics
These days statistical methods are applicable everywhere. There is no field of work
in which statistical methods are not applied. According to A L. Bowley, ‘A
knowledge of statistics is like a knowledge of foreign languages or of Algebra, it
may prove of use at any time under any circumstances”. The importance of the
statistical science is increasing in almost all spheres of knowledge, e g., astronomy,
biology, meteorology, demography, economics and mathematics. Economic
planning without statistics is bound to be baseless.
Statistics serve in administration, and facilitate the work of formulation of new
policies. Financial institutions and investors utilise statistical data to summaries the
past experience. Statistics are also helpful to an auditor, when he uses sampling
techniques or test checking to audit the accounts of his client.
Limitation of statistics
The scope of the science of statistic is restricted by certain limitations :
1. The use of statistics is limited numerical studies: Statistical methods cannot
be applied to study the nature of all type of phenomena. Statistics deal with only
&
School of Law
such phenomena as are capable of being quantitatively measured and numerically

expressed. For, example, the health, poverty and intelligence of a group of
individuals, cannot be quantitatively measured, and thus are not suitable subjects
for statistical study.
2. Statistical methods deal with population or aggregate of individuals rather than
with individuals. When we say that the average height of an Indian is 1 metre 80
centimetres, it shows the height not of an individual but as found by the study of all
individuals.
3. Statistical relies on estimates and approximations : Statistical laws are not
exact laws like mathematical or chemical laws. They are derived by taking a
majority of cases and are not true for every individual. Thus the statistical
inferences are uncertain.
4. Statistical results might lead to fallacious conclusions by deliberate manipulation
of figures and unscientific handling. This is so because statistical results are
represented by figures, which are liable to be manipulated. Also the data placed in
the hands of an expert may lead to fallacious results. The figures may be stated
without their context or may be applied to a fact other than the one to which they
really relate. An interesting example is a survey made some years ago which
reported that 33% of all the girl students at John Hopkins University had married
University teachers.
What is frequency distribution
Collected and classified data are presented in a form of frequency distribution.
Frequency distribution is simply a table in which the data are grouped into classes
on the basis of common characteristics and the number of cases which fall in each
class are recorded. It shows the frequency of occurrence of differentvalues of a
single variable. A frequency distribution is constructed to satisfy three objectives :
(i) to facilitate the analysis of data,
(ii) to estimate frequencies of the unknown population distribution from the
distribution of sample data, and
(iii) to facilitate the computation of various statistical measures.
Frequency distribution can be of two types :
1. Univariate Frequency Distribution.
&
School of Law
2. Bivariate Frequency Distribution.

In this lesson, we shall understand the Univariate frequency distribution.
Univariate distribution incorporatesdifferent values of one variable only whereas
the Bivariate frequency distribution incorporates the values of two variables. The
Univariate frequency distribution is further classified into three categories:
(i) Series of individual observations,
(ii) Discrete frequency distribution, and
(iii) Continuous frequency distribution.
Series of individual observations, is a simple listing of items of each observation. If
marks of 14 students in statistics of a class are given individually, it will form a
series of individual observations.
Marks obtained in Statistics :
Roll Nos. 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Marks: 60 71 80 41 81 41 85 35 98 52 50 91 30 88
Marks in Ascending Order Marks in Descending Order
30 98
35 91
41 88
41 85
50 81
52 80
60 71
71 60
80 52
81 50
85 41
88 41
91 35
98 30
Discrete Frequency Distribution:-
&
School of Law
In a discrete series, the data are presented in such a way that exact measurements
of units are indicated. In a discrete frequency distribution, we count the number of
times each value of the variable in data given to you. This is facilitated through the
technique of tally bars.
In the first column, we write all values of the variable. In the second column, a
vertical bar called tally bar against the variable, we write a particular value has
occurred four times, for the fifth occurrence, we put a cross tally mark ( / ) on the
four tally bars to make a block of 5. The technique of putting cross tally bars at
every fifth repetition facilitates the counting of the number of occurrences of the
value. After putting tally bars for all the values in the data; we count the number of
times each value is repeated and write it against the corresponding value of the
variable in the third column entitled frequency. This type of representation of the
data is called discrete frequency distribution.
We are given marks of 42 students:
55 51 57 40 26 43 46 41 46 48 33 40 26 40 40 41
43 53 45 53 33 50 40 33 40 26 53 59 33 39 55 48
15 26 43 59 51 39 15 45 26 15
We can construct a discrete frequency distribution from the above given marks.
Marks of 42 Students
------------------------------------------
Marks Tally Bars Frequency
------------------------------------------
15 ||| 3
26 5
33 |||| 4
39 || 2
40 5
41 || 2
43 ||| 3
45 || 2
46 || 2
&
School of Law
48 || 2
50 | 1
51 || 2
53 ||| 3
55 ||| 3
57 | 1
59 || 2
Total 42
The presentation of the data in the form of a discrete frequency distribution is
better than arranging but it does not condense the data as needed and is quite
difficult to grasp and comprehend. This distribution is quite simple in case the
values of the variable are repeated otherwise there will be hardly any condensation.
Continuous Frequency Distribution:-
If the identity of the units about a particular information collected, is neither
relevant nor is the order in which the observations occur, then the first step of
condensation is to classify the data into different classes by dividing the entire
group of values of the variable into a suitable number of groups and then recording
the number of observations in each group. Thus, we divide the total range of values
of the variable (marks of 42 students) i.e. 59-15 = 44 into groups of 10 each, then
we shall get
(42/10) 5 groups and the distribution of marks is displayed by the following
frequency distribution:
Marks of 42 Students
---------------------------------------------------------------------
Marks (×) Tally Bars Number of Students (f)
---------------------------------------------------------------------
15-25 ||| 3
25-35 |||| 9
35-45 || 12
45-55 || 12
55-65 6
&
School of Law
----------------------------------------------------------------------
Total 42
Graphs of Frequency Distributions
The guiding principles for the graphic representation of the frequency distributions
are same as for the diagrammatic and graphic representation of other types of data.
The information contained in a frequency distribution can be shown in graphs
which reveals the important characteristics and relationships that are not easily
discernible on a simple examination of the frequency tables. The most commonly
used graphs for charting a frequency distribution are :
1. Histogram
2. Frequency polygon
3. Smoothed frequency curves
4. Ogives or cumulative frequency curves.
1. Histogram
The term ‘histogram’ must not be confused with the term ‘historigram’
which relates to time charts. Histogram is the best way of presenting
graphically a simple frequency distribution. The statistical meaning of
histogram is that it is a graph that represents the class frequencies in a
frequency distribution by vertical adjacent rectangles.
While constructing histogram the variable is always taken on the X-axis and
the corresponding classinterval. The distance for each rectangle on the X-
axis shall remain the same in case the class-intervals are uniform throughout;
if they are different the width of the rectangles shall also change
proportionately. The Yaxis represents the frequencies of each class which
constitute the height of its rectangle. We get a series of rectangles each
having a class interval distance as its width and the frequency distance as its
height. The
area of the histogram represents the total frequency.
The histogram should be clearly distinguished from a bar diagram. A bar
diagram is one-dimensional where the length of the bar is important and not
the width, a histogram is two-dimensional where both the length and width
&
School of Law
are important. However, a histogram can be misleading if the distribution

has unequal class intervals and suitable adjustments in frequencies are not
made.
The technique of constructing histogram is explained for :
(i) distributions having equal class-intervals, and
(ii) distributions having unequal class-intervals.
Example : Draw a histogram from the following data :

--------------------------------------
Classes Frequency
-------------------------------
0-10 5
10-20 11
20-30 19
30-40 21
40 -50 16
50-60 10
60-70 8
70-80 6
80-90 3
90–100 2
-------------------------
Solution :
When class-intervals are unequal the frequencies must be adjusted before
constructing a histogram. We take that class which has the lowest class-interval
and adjust the frequencies of classes accordingly. If one class interval is twice as
wide as the one having the lowest class-interval we divide the height of its
rectangle by two, if it is three times more we divide it by three etc. the heights will
be proportional to the ratios of the frequencies to the width of the classes
&
School of Law
----------------------------------------------------------------------
2. Frequency Polygon
This is a graph of frequency distribution which has more than four sides. It is
particularly effective in comparing two or more frequency distributions. There are
two ways of constructing a frequency polygon.
(i) We may draw a histogram of the given data and then join by straight line the
mid-points of the upper horizontal side of each rectangle with the adjacent ones.
The figure so formed shall be frequency polygon. Both the ends of the polygon
should be extended to the base line in order to make the area under frequency
polygons equal to the area under Histogram.
(ii) Another method of constructing frequency polygon is to take the mid-points of
the various classintervals and then plot the frequency corresponding to each point
and join all these points by straight lines. The figure obtained by both the methods
would be identical.
Frequency polygon has an advantage over the histogram. The frequency polygons
of several distributions can be drawn on the same axis, which makes comparisons
possible whereas histogram cannot be used in the same way. To compare
histograms we need to draw them on separate graphs.
3. Cumulative Frequency Curves or Ogives
We have discussed the charting of simple distributions where each frequency refers
to the measurement of the class-interval against which it is placed. Sometimes it
becomes necessary to know the number of items whose values are greater or less
than a certain amount. We may, for example, be interested in knowing the number
of students whose weight is less than 65 Ibs. or more than say 15.5 Ibs. To get this
information, it is necessary to change the form of frequency distribution from a
simple to a cumulative distribution. In a cumulative frequency distribution, the
frequency of each class is made to include the frequencies of all the lower or all the
upper classes depending upon the manner in which cumulation is done. The graph
of such a distribution is called a cumulative frequency curve or an Ogive.
There are two method of constructing ogives, namely:
(i) less than method, and
&
School of Law
(ii) more than method

In less than method, we start with the upper limit of each class and go on adding
the frequencies.When these frequencies are plotted we get a rising curve.In more
than method, we start with the lower limit of each class and we subtract the
frequency of each class from total frequencies. When these frequencies are plotted,
we get a declining curve. This example would illustrate both types of ogives.
Example : Draw ogives by both the methods from the following data.
Distribution of weights of the students of a college (Ibs.)
-----------------------------------------------------
Weights No. of Students
-----------------------------------------------------
90.5-100.5
100.5-110.5 34
110.5-120.5 139
120.5-130.5 300
130.5-140.5 367
140.5-150.5 319
150.5-160.5 205
160.5-170.5 76
170.5-180.5 43
180.5-190.5 16
190.5-200.5 3
200.5-210.5 4
210.5-220.5 3
220.5-230.5 1
-----------------------------------------------------
Solution : First of all we shall find out the cumulative frequencies of the given
data by less than method.
--------------------------------------------------------------
Less than (Weights) Cumulative Frequency
&
School of Law
--------------------------------------------------------------
100.5 5
110.5 39
120.5 178
130.5 478
140.5 845
150.5 1164
160.5 1369
170.5 1445
180.5 1488
190.5 1504
200.5 1507
210.5 1511
220.5 1514
230.5 1515
--------------------------------------------------------------
Plot these frequencies and weights on a graph paper. The curve formed is called an
Ogive Now we calculate the cumulative frequencies of the given data by more than
method.
--------------------------------------------------------------
More than (Weights) Cumulative Frequencies
--------------------------------------------------------------
90.5 1515
100.5 1510
110.5 1476
120.5 1337
130.5 1037
140.5 670
150.5 351
160.5 146
&
School of Law
170.5 70
180.5 27
190.5 11
200.5 8
210.5 4
220.5 1
--------------------------------------------------------------
By plotting these frequencies on a graph paper, we will get a declining curve which
will be our cumulative frequency curve or Ogive by more than method.
Although the graphs are a powerful and effective method of presenting statistical
data, they are not under all circumstances and for all purposes complete substitutes
for tabular and other forms of presentation.The specialist in this field is one who
recognizes not only the advantages but also the limitations of these techniques. He
knows when to use and when not to use these methods and from his experience and
expertise is able to select the most appropriate method for every purpose.
Example :Draw an ogive by less than method and determine the number of
companies earning profits between Rs. 45 crores and Rs. 75 crores :
------------------------------------------------------------------------
Profits No. of Profits No. of
(Rs. crores) Companies (Rs. crores) Companies
------------------------------------------------------------------------
10—20 8 60—70 10
20—30 12 70—80 7
30—40 20 80—90 3
40—50 24 90—100 1
50—6.0 15
------------------------------------------------------------------------
Solution :
OGIVE BY LESS THAN METHOD
&
School of Law
-----------------------------------------------
Profits No.of
(Rs. crores) Companies
----------------------------------------------
Less than 20 8
Less than 30 20
Less than 40 40
Less than 50 64
Less than 60 79
Less than 70 89
Less than 80 96
Less than 90 99
Less than 100 100
-----------------------------------------------
It is clear from the graph that the number of companies getting profits less than
Rs.75 crores is 92 and the number of companies getting profits less than Rs. 45
crores is 51. Hence the number of companies getting profits between Rs. 45 crores
and Rs. 75 crores is 92 – 51 = 41.
Example :The following distribution is with regard to weight in grams of mangoes
of a given variety. If mangoes of weight less than 443 grams be considered
unsuitable for foreign market, what is the percentage of total mangoes suitable for
it? Assume the given frequency distribution to be typical of the variety:
------------------------------------------------------------------------------------------------
Weight in gms. No. of mangoes Weight in gms. No. of mangoes
---------------------------------------------------------------------------------
410 – 119 10 450 – 159 45
420 – 429 20 460 – 469 18
430 – 139 42 470 – 179 7
440 – 449 54
---------------------------------------------------------------------------------
&
School of Law
Draw an ogive of ‘more than’ type of the above data and deduce how many
mangoes will be more than 443 grams.
Solution : Mangoes weighting more than 443 gms. are suitable for foreign market.
Number of mangoes weighting more than 443 gms. lies in the last four classes.
Number of mangoes weighing between 444 and 449 grams would be
Total number of mangoes weighing more than 443 gms. = 32.4 + 45 + 18 + 7 =
102.4
Percentage of mangoes =
Therefore, the percentage of the total mangoes suitable for foreign market is 52.25.
OGIVE BY MORE THAN METHOD
------------------------------------------------------------------
Weight more than (gms.) No. of Mangoes
------------------------------------------------------------------
410 196
420 186
430 166
440 124
450 70
460 25
470 7
------------------------------------------------------------------
From the graph it can be seen that there are 103 mangoes whose weight will be
more than 443 gms. and are suitable for foreign market.
DIAGRAM:-
Statistical data can be presented by means of frequency tables, graphs and
diagrams. In this lesson, so far we have discussed the graphical presentation. Now
we shall take up the study of diagrams. There are many variety of diagrams but
here we are concerned with the following types only :
(i) Bar diagrams
Bar Diagram:-
A bar diagram may be simple or component or multiple. A simple bar diagram is
used to represent only one variable. Length of the bars is proportional to the
magnitude to be represented. But when we are interested in showing various parts
&
School of Law
of a whole, then we construct component or composite bar diagrams. Whenever

comparisons of more than one variable is to be made at the same time, then
multiple bar chart, which groups two or more bar charts together, is made use of.
We shall now illustrate these by examples.
Example 1 : The following table gives the average approximate yield of rice in
Ibs, per acre in various countries of the world in 2000–05.
-------------------------------------------------------
Country Yield in lbs. per acre
-------------------------------------------------------
India 728
Siam 943
U.S.A. 1469
Italy 2903
Egypt 2153
Japan 2276
-------------------------------------------------------
Indicate this by a suitable diagram
Solution :
In the above example, bars have been erected vertically. Also bars may be erected
horizontally.
What is Central Tendency

One of the important objectives of statistics is to find out various numerical values
which explains theinherent characteristics of a frequency distribution. The first of
such measures is averages. The averages arethe measures which condense a huge
unwieldy set of numerical data into single numerical values whichrepresent the
entire distribution. The inherent inability of the human mind to remember a large
body ofnumerical data compels us to few constants that will describe the data.
Averages provide us the gist and givea bird’s eye view of the huge mass of
unwieldy numerical data. Averages are the typical values around whichother items
of the distribution congregate. This value lie between the two extreme observations
of the distribution and give us an idea about the concentration of the values in the
central part of the distribution. They are called the measures of central tendency.
&
School of Law
Averages are also called measures of location since they enable us to locate the
position or place of the distribution in question. Averages are statistical constants
which enables us to comprehend in a single value the significance of the whole
group. According to Croxlon and Cowden, an average value is a single value
within the range of the data that is used to represent all the values in that series.
Since an average issomewhere within the range of data, it is sometimes called a
measure of central value. An average is the most typical representative item of the
group to which it belongs and which is capable of revealing all important
characteristics of that group or distribution.
What are the Objects of Central Tendency
The most important object of calculating an average or measuring central tendency
is to determine a single figure which may be used to represent a whole series
involving magnitudes of the same variable. Second object is that an average
represents the empire data, it facilitates comparison within one group or between
groups of data. Thus, the performance of the members of a group can be compared
with the average performance of different groups.
Third object is that an average helps in computing various other statistical
measures such as dispersion,skewness, kurtosis etc.
Essential of a Good Average
An average represents the statistical data and it is used for purposes of comparison,
it must possess the following properties.
1. It must be rigidly defined and not left to the mere estimation of the observer. If
the definition is rigid, the computed value of the average obtained by different
persons shall be similar.
2. The average must be based upon all values given in the distribution. If the item
is not based on all value it might not be representative of the entire group of data.
3. It should be easily understood. The average should possess simple and obvious
properties. It should be too abstract for the common people.
4. It should be capable of being calculated with reasonable care and rapidity.
5. It should be stable and unaffected by sampling fluctuations.
6. It should be capable of further algebraic manipulation.
Different methods of measuring “Central Tendency” provide us with different
kinds of averages. The following are the main types of averages that are commonly
&
School of Law
used:
1. Mean
(i) Arithmetic mean
(ii) Weighted mean
(iii) Geometric mean
(iv) Harmonic mean
2. Median
3. Mode
Arithmetic Mean: The arithmetic mean of a series is the quotient obtained by
dividing the sum of the values by the number of items. In algebraic language, if
X1, X2, X3 ....... Xn are the n values of a variate X.
Then the Arithmetic Mean is defined by the following formula:
=
=
Example : The following are the monthly salaries (Rs.) of ten employees in an
office. Calculate the meansalary of the employees: 250, 275, 265, 280, 400, 490,
670, 890, 1100, 1250.
Solution : =
= = Rs. 587
Short-cut Method: Direct method is suitable where the number of items is
moderate and the figures are small sizes and integers. But if the number of items is
large and/or the values of the variate are big, then the process of adding together all
the values may be a lengthy process. To overcome this difficulty ofcomputations, a
short-cut method may be used. Short cut method of computation is based on an
importantcharacteristic of the arithmetic mean, that is, the algebraic sum of the
deviations of a series of individualobservation from their mean is always equal to
zero. Thus deviations of the various values of the variate from an assumed mean
computed and the sum is divided by the number of items. The quotient obtained
is added to the assumed mean lo find the arithmetic mean.
Symbolically, = . where A is assumed mean and dx are deviations = (X – A).
We can solve the previous example by short-cut method.
Computation of Arithmetic Mean
----------------------------------------------------------------------------------
&
School of Law
Serial Salary (Rupees) Deviations from assumed mean

Number X where dx = (X – A), A = 400
----------------------------------------------------------------------------------
1. 250 –150
2. 275 –125
3. 265 –135
4. 280 –120
5. 400 0
6. 490 +90
7. 670 +270
8. 890 +490
9. 1100 + 700
10. 1250 + 850
----------------------------------------------------------------
N = 10 ∑dx = 1870
--------------------------------------------------------------
By substituting the values in the formula, we get
=
Computation of Arithmetic Mean in Discrete series. In discrete series, arithmetic
mean may be computed by both direct and short cut methods. The formula
according to direct method is:
=
where the variable values X1 X2, .......... Xn, have frequencies f1, f2, ................fn
and N = ∑f.
Example : The following table gives the distribution of 100 accidents during seven
days of the week in a given month. During a particular month there were 5 Fridays
and Saturdays and only four each of other days. Calculate the average number of
accidents per day.
Days : Sun. Mon. Tue. Wed. Thur. Fri. Sat. Total
Number of
accidents : 20 22 10 9 11 8 20 = 100
&
School of Law
Solution : Calculation of Number of Accidents per Day

------------------------------------------------------------
Day No. of No. of Days Total Accidents
Accidents in Month
X f fX
-------------------------------------------------------------
Sunday 20 4 80
Monday 22 4 88
Tuesday 10 4 40
Wednesday 9 4 36
Thursday 11 4 44
Friday 8 5 40
Saturday 20 5 100
--------------------------------------------------------------
100 N = 30 ∑fX = 428
--------------------------------------------------------------
= = 14.27 = 14 accidents per day
The formula for computation of arithmetic mean according to the short cut method
is = where A is assumed mean, dx = (X – A) and N = ∑f.
We can solve the previous example by short-cut method as given below :
Calculation of Average Accidents per Day
-------------------------------------------------------------------------
Day X dx = X – A f fdx
(where A = 10)
--------------------------------------------------------------------------
Sunday 20 + 10 4 + 40
Monday 22 + 12 4 + 48
Tuesday 10 +0 4 +0
Wednesday 9 –1 4 –4
Thursday 11 +1 4 +4
Friday 8 –2 5 - 10
Saturday 20 + 10 5 + 50
----------------------------------------------------------------------
&
School of Law
30 + 128
-----------------------------------------------------------------------
= = = 14 accidents per day
Calculation of arithmetic mean for Continuous Series: The arithmetic mean can
be computed both by direct and short-cut method. In addition, a coding method or
step deviation method is also applied for simplification of calculations. In any case,
it is necessary to find out the mid-values of the various classes in the frequency
distribution before arithmetic mean of the frequency distribution can be computed.
Once the mid-points of various classes are found out, then the process of the
calculation of arithmetic mean is same as in the case of discrete series. In case of
direct method, the formula to be used:
= , when m = mid points of various classes and N = total frequency In the short-cut
method, the following formula is applied:
= where dx = (m – A) and N = ∑f
The short-cut method can further be simplified in practice and is named coding
method. The deviations from the assumed mean are divided by a common factor to
reduce their size. The sum of the products of the deviations and frequencies is
multiplied by this common factor and then it is divided by the total frequency and
added to the assumed mean. Symbolically
= where and i = common factor
Geometric Mean :
In general, if we have n numbers (none of them being zero), then the GM. is
defined as
G.M. =
In case of a discrete series, if x1, x2,............. xn occur f1, f2, ............... fn times
respectively and N is
the
total frequency (i.e. N = f1 + f2...................fn ), then
G.M. =
For convenience, use of logarithms is made extensively to calculate the nth root. In
terms of logarithms
G.M. =
&
School of Law
= , where AL refers to antilog.

and in case of continuous series, G.M. =
Example : Calculate geometric mean of the following data :
x 5 6 7 8 9 10 11
f 2 4 7 10 9 6 2
Solution : Calculation of G.M.
-----------------------------------------------------------------------
x log x f f log x
----------------------------------------------------------------------
5 0.6990 2 1.3980
6 0.7782 4 3.1128
7 0.8451 7 5.9157
8 0.9031 10 9.0310
9 0.9542 9 8.5878
10 1.0000 6 6.0000
11 1.0414 2 2.0828
------------------------------------------------------------------------
N = 40 ∑f log x = 36.1281
--------------------------------------------------------------------------
Median
The median is that value of the variable which divides the group in two equal parts.
One part comprising the values greater than and the other all values less than
median. Median of a distribution may be defined as that value of the variable
which exceeds and is exceeded by the same number of observation. It is the value
such that the number of observations above it is equal to the number of
observations below it.
Thus we know that the arithmetic mean is based on all items of the distribution, the
median is positional average, that is, it depends upon the position occupied by a
value in the frequency distribution.
When the items of a series are arranged in ascending or descending order of
magnitude the value of the middle item in the series is known as median in the case
of individual observation. Symbolically.
Median = size of th item
&
School of Law
It the number of items is even, then there is no value exactly in the middle of the
series. In such a situation the median is arbitrarily taken to be halfway between the
two middle items. Symbolically.
Median =
Location of Median in Discrete series: In a discrete series, medium is computed
in the following manner:
(i) Arrange the given variable data in ascending or descending order,
(ii) Find cumulative frequencies.
(iii) Apply Med. = size of th item
(iv) Locate median according to the size i.e., variable corresponding to the size or
for next cumulative frequency.
Example: Following are the number of rooms in the houses of a particular locality.
Find median of the data:
No. of rooms: 3 4 5 6 7 8
No of houses: 38 654 311 42 12 2
Solution: Computation of Median
------------------------------------------------------------------------
No. of Rooms No. of Houses cumulative Frequency
X f Cf
-----------------------------------------------------------------------
3 38 38
4 654 692
5 311 1003
6 42 1045
7 12 1057
8 2 1059
------------------------------------------------------------------
Median = size of th item = size of th item = 530 th item.
Median lies in the cumulative frequency of 692 and the value corresponding to this
is 4
Therefore, Median = 4 rooms.
In a continuous series, median is computed in the following manner:
(i) Arrange the given variable data in ascending or descending order.
&
School of Law
(ii) If inclusive series is given, it must he converted into exclusive series to find
real class interval
(iii) Find cumulative frequencies.
(iv) Apply Median = size of th item to ascertain median class.
(v) Apply formula of interpolation to ascertain the value of median.
Median = l1 + or Median = l2 –
where, l1 refers to lower limit of median class,
l2 refers to higher limit of median class,
cfo refers cumulative frequency of previous to median class,
f refers to frequency of median class,
Example: The following table gives you the distribution of marks secured by some
students in an examination:
Marks No. of Students
0—20 42
21—30 38
31—40 120
41—50 84
51— 60 48
61—70 36
71—80 31
Find the median marks.
Solution: Calculation of Median Marks
---------------------------------------------------
Marks No. of Students cf
(x) (f)
--------------------------------------------------
0 – 20 42 42
21 – 30 38 80
31 – 40 120 200
41 – 50 84 284
51 – 60 48 332
61 – 70 36 368
71 – 80 31 399
&
School of Law
---------------------------------------------------
Median = size of th item = size of th item = 199.5 th item.
which lies in (31 – 40) group, therefore the median class is 30.5 – 40.5.
Applying the formula of interpolation.
Median = l1 +
= 30.5 +
Mode
Mode is that value of the variable which occurs or repeats itself maximum number
of item. The mode is most “ fashionable” size in the sense that it is the most
common and typical and is defined by Zizek as “the value occurring most
frequently in series of items and around which the other items are distributed most
densely.” In the words of Croxton and Cowden, the mode of a distribution is the
value at the point where the items tend to be most heavily concentrated. According
to A.M. Tuttle, Mode is the value which has the greater frequency density in its
immediate neighbourhood. In the case of individual observations, the mode is that
value which is repeated the maximum number of times in the series. The value of
mode can be denoted by the alphabet z also.
Example : Calculate mode from the following data:

Sr. Number : 1 2 3 4 5 6 7 8 9 10
Marks obtained : 10 27 24 12 27 27 20 18 15 30
Solution :
----------------------------------------
Marks No. of students
----------------------------------------
10 1
12 1
15 1
18 1
20 1
24 1
27 3 Mode is 27 marks
30 1
&
School of Law
----------------------------------------
Calculation of Mode in Discrete series. In discrete series, it is quite often
determined by inspection.We can understand with the help of an example:
X 1 2 3 4 5 6 7
f 4 5 13 6 12 8 6
By inspection, the modal size is 3 as it has the maximum frequency. But this test of
greatest frequency is not fool proof as it is not the frequency of a single class, but
also the frequencies of the neighbour classes that decide the mode. In such cases,
we shall be using the method of Grouping and Analysis table.
Size of shoe 1 2 3 4 5 6 7
Frequency 4 5 13 6 12 8 6
Solution : By inspection, the mode is 3, but the size of mode may be 5. This is so
because the neighboring frequencies of size 5 are greater than the neighbouring
frequencies of size 3. This effect of neighbouring frequencies is seen with the help
of grouping and analysis table technique.
Measures of dispersion
For the study of dispersion, we need some measures which show whether the
dispersion is small or large. There are two types of measure of dispersion, which
are:
(a) Absolute Measures of Dispersion

(b) Relative Measures of Dispersion
Absolute Measures of Dispersion
These measures give us an idea about the amount of dispersion in a set of
observations. They give the answers in the same units as the units of the original
observations. When the observations are in kilograms, the absolute measure is also
in kilograms. If we have two sets of observations, we cannot always use the
absolute measures to compare their dispersions. We shall explain later as to when
the absolute measures can be used for comparison of dispersion in two or more sets
of data. The absolute measures which are commonly used are:
1. The Range
&
School of Law
2. The Quartile Deviation

3. The Mean Deviation
4. The Standard Deviation and Variance
Relative Measures of Dispersion

These measures are calculated for the comparison of dispersion in two or more sets
of observations. These measures are free of the units in which the original data is
measured. If the original data is in dollars or kilometers, we do not use these units
with relative measures of dispersion. These measures are a sort of ratio and are
called coefficients. Each absolute measure of dispersion can be converted into its
relative measure. Thus the relative measures of dispersion are;
Coefficient of Range or Coefficient of Dispersion
Coefficient of Quartile Deviation or Quartile Coefficient of Dispersion
Coefficient of Mean Deviation or Mean Deviation of Dispersion
Coefficient of Standard Deviation or Standard Coefficient of Dispersion
Coefficient of Variation (a special case of Standard Coefficient of Dispersion)
Range
In statistics, range is defined simply as the difference between the maximum

and minimum observations. It is intuitively obvious why we define range in
statistics this way - range should suggest how diversely spread out the values
&
School of Law
are, and by computing the difference between the maximum and minimum
values, we can get an estimate of the spread of the data.
For example, suppose an experiment involves finding out the weight of lab rats and
the values in grams are 320, 367, 423, 471 and 480. In this case, the range is
simply computed as 480-320 = 160 grams.
Some Limitations of Range
Range is quite a useful indication of how spread out the data is, but it has some
serious limitations. This is because sometimes data can have outliers that are
widely off the other data points. In these cases, the range might not give a true
indication of the spread of data.
For example, in our previous case, consider a small baby rat added to the data set
that weighs only 50 grams. Now the range is computed as 480-50 = 430 grams,
which looks like a false indication of the dispersion of data.
This limitation of range is to be expected primarily because range is computed

taking only two data points into consideration. Thus it cannot give a very good
estimate of how the overall data behaves.
Mean deviation
Referred to as average deviation, it is defined as the sum of the

deviations(ignoring signs) from an average divided by the number of items in a
distribution The average can be mean, median or mode. Theoretically median is d
best average of choice because sum of deviations from median is minimum,
provided signs are ignored. However, practically speaking, arithmetic mean is the
most commonly used average for calculating mean deviation and is denoted by
the symbol MDMD.
&
School of Law
We're going to discuss methods to compute the Mean Deviation for three types of
series:
• Individual Data Series
• Discrete Data Series
• Continuous Data Series
Individual Data Series
When data is given on individual basis. Following is an example of individual
series:
Items 5 10 20 30 40 50 60 70
Discrete Data Series

When data is given alongwith their frequencies. Following is an example of
discrete series:
Items 5 10 20 30 40 50 60 70
Frequency 2 5 1 3 12 0 5 7
Continuous Data Series

When data is given based on ranges alongwith their frequencies. Following is an
example of continous series:
Items 0-5 5-10 10-20 20-30 30-40
Frequency 2 5 1 3 12
The mean difference (more correctly, 'difference in means') is a standard statistic
that measures the absolute difference between the mean value in two groups in a
clinical trial. It estimates the amount by which the experimental intervention
changes the outcome on average compared with the control.
&
School of Law
Formula
Mean Difference=∑x1n−∑x2nMean Difference=∑x1n−∑x2n
Where −
• x1x1 = Mean of group one
• x2x2 = Mean of group two
• nn = Sample size
Example
Problem Statement:
There are 2 dance groups whose data is listed below. Find the mean difference
between these dance groups.
Group 1 3 9 5 7
Group 2 5 3 4 4
Solution:
∑x1=3+9+5+7=24∑x2=5+3+4+4=16M1=∑x1n=244=6M2=∑x2n=164=4MeanDi
fference=6−4=2
Standard Deviation
The Standard Deviation is a measure of how spread out numbers are.
You might like to read this simpler page on Standard Deviation first.
But here we explain the formulas.
The symbol for Standard Deviation is σ (the Greek letter sigma).

&
School of Law
This is the formula for Standard Deviation:
Say what? Please explain!
OK. Let us explain it step by step.
Say we have a bunch of numbers like 9, 2, 5, 4, 12, 7, 8, 11.
To calculate the standard deviation of those numbers:
• 1. Work out the Mean (the simple average of the numbers)

• 2. Then for each number: subtract the Mean and square the result
• 3. Then work out the mean of those squared differences.
• 4. Take the square root of that and we are done!
The formula actually says all of that, and I will show you how.
The Formula Explained
First, let us have some example values to work on:

&
School of Law
Example: Sam has 20 Rose Bushes.
The number of flowers on each bush is
9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4
Work out the Standard Deviation.
Step 1. Work out the mean
In the formula above μ (the greek letter "mu") is the mean of all our values ...
Example: 9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4
The mean is:
9+2+5+4+12+7+8+11+9+3+7+4+12+5+4+10+9+6+9+420
= 14020 = 7
So:
μ=7
Step 2. Then for each number: subtract the Mean and square the result
This is the part of the formula that says:

&
School of Law
So what is xi ? They are the individual x values 9, 2, 5, 4, 12, 7, etc...
In other words x1 = 9, x2 = 2, x3 = 5, etc.
So it says "for each value, subtract the mean and square the result", like this
Example (continued):
(9 - 7)2 = (2)2 = 4
(2 - 7)2 = (-5)2 = 25
(5 - 7)2 = (-2)2 = 4
(4 - 7)2 = (-3)2 = 9
(12 - 7)2 = (5)2 = 25
(7 - 7)2 = (0)2 = 0
(8 - 7)2 = (1)2 = 1
... etc ...
And we get these results:
4, 25, 4, 9, 25, 0, 1, 16, 4, 16, 0, 9, 25, 4, 9, 9, 4, 1, 4, 9
Step 3. Then work out the mean of those squared differences.
To work out the mean, add up all the values then divide by how many.
First add up all the values from the previous step.

&
School of Law
But how do we say "add them all up" in mathematics? We use "Sigma": Σ
The handy Sigma Notation says to sum up as many terms as we want:
Sigma Notation
We want to add up all the values from 1 to N, where N=20 in our case because
there are 20 values:
Which means: Sum all values from (x1-7)2 to (xN-7)2
We already calculated (x1-7)2=4 etc. in the previous step, so just sum them up:
= 4+25+4+9+25+0+1+16+4+16+0+9+25+4+9+9+4+1+4+9 = 178
But that isn't the mean yet, we need to divide by how many, which is done
by multiplying by 1/N (the same as dividing by N):
Mean of squared differences = (1/20) × 178 = 8.9

&
School of Law
(Note: this value is called the "Variance")
Step 4. Take the square root of that:

Example (concluded):
σ = √(8.9) = 2.983...
DONE!
Sample Standard Deviation
But wait, there is more ...
... sometimes our data is only a sample of the whole population.
Example: Sam has 20 rose bushes, but only counted the flowers on 6 of them!
The "population" is all 20 rose bushes,

&
School of Law
and the "sample" is the 6 bushes that Sam counted the flowers of.
Let us say Sam's flower counts are:
9, 2, 5, 4, 12, 7
We can still estimate the Standard Deviation.
But when we use the sample as an estimate of the whole population, the Standard
Deviation formula changes to this:
The formula for Sample Standard Deviation:
The important change is "N-1" instead of "N" (which is called "Bessel's

correction").
The symbols also change to reflect that we are working on a sample instead of the
whole population:
• The mean is now x (for sample mean) instead of μ (the population mean),
• And the answer is s (for Sample Standard Deviation) instead of σ.
But that does not affect the calculations. Only N-1 instead of N changes the
calculations.
OK, let us now calculate the Sample Standard Deviation:

&
School of Law
Step 1. Work out the mean

Example 2: Using sampled values 9, 2, 5, 4, 12, 7
The mean is (9+2+5+4+12+7) / 6 = 39/6 = 6.5
So:
x = 6.5
Step 2. Then for each number: subtract the Mean and square the result
Example 2 (continued):
(9 - 6.5)2 = (2.5)2 = 6.25
(2 - 6.5)2 = (-4.5)2 = 20.25
(5 - 6.5)2 = (-1.5)2 = 2.25
(4 - 6.5)2 = (-2.5)2 = 6.25
(12 - 6.5)2 = (5.5)2 = 30.25
(7 - 6.5)2 = (0.5)2 = 0.25
Step 3. Then work out the mean of those squared differences.
To work out the mean, add up all the values then divide by how many.
But hang on ... we are calculating the Sample Standard Deviation, so instead of
dividing by how many (N), we will divide by N-1
&
School of Law
Example 2 (continued):
Sum = 6.25 + 20.25 + 2.25 + 6.25 + 30.25 + 0.25 = 65.5
Divide by N-1: (1/5) × 65.5 = 13.1
(This value is called the "Sample Variance")
Step 4. Take the square root of that:

Example 2 (concluded):
s = √(13.1) = 3.619...
DONE!
Coefficient of variation
How to Find a Coefficient of Variation

How to Find a Coefficient of Variation: Contents:
1. What is the Coefficient of Variation?
2. How to Find the Coefficient of Variation
What is the Coefficient of Variation?
&
School of Law
The coefficient of variation (CV) is a measure of relative variability. It is

the ratio of the standard deviation to the mean (average). For example, the
expression “The standard deviation is 15% of the mean” is a CV.
The CV is particularly useful when you want to compare results from two different
surveys or tests that have different measures or values. For example, if you are
comparing the results from two tests that have different scoring mechanisms. If
sample A has a CV of 12% and sample B has a CV of 25%, you would say that
sample B has more variation, relative to its mean.
Formula
The formula for the coefficient of variation is:
Coefficient of Variation = (Standard Deviation / Mean) * 100.
In symbols: CV = (SD/ ) * 100.
Multiplying the coefficient by 100 is an optional step to get a percentage, as
opposed to a decimal.
Coefficient of Variation Example

A researcher is comparing two multiple-choice tests with different conditions. In
the first test, a typical multiple-choice test is administered. In the second test,
alternative choices (i.e. incorrect answers) are randomly assigned to test takers.
The results from the two tests are:
Regular Test Randomized Answers
Mean 59.9 44.8

&
School of Law
SD 10.2 12.7
Trying to compare the two test results is challenging. Comparing standard

deviations doesn’t really work, because the means are also different. Calculation
using the formula CV=(SD/Mean)*100 helps to make sense of the data:
Regular Test Randomized Answers
Mean 59.9 44.8
SD 10.2 12.7
CV 17.03 28.35
Looking at the standard deviations of 10.2 and 12.7, you might think that the tests
have similar results. However, when you adjust for the difference in the means, the
results have more significance:
Regular test: CV = 17.03
Randomized answers: CV = 28.35
The coefficient of variation can also be used to compare variability between
different measures. For example, you can compare IQ scores to scores on the
Woodcock-Johnson III Tests of Cognitive Abilities.
Why Sample?
• Pool of possible cases is too large (e.g., 260 million Americans) -- would
cost too much and take too long
• Don't want to use up the cases: e.g., when testing light bulbs to see how long
they last, you take a bulb and leave it on until it burns out. You can't test all
the bulbs this way, because their whole objective is to sell the bulbs, not
burn them out.
&
School of Law
• It's not necessary to survey all cases: for most purposes, taking a sample
yields a estimates that are accurate enough.
• The trade-off is that sampling does introduce some error. You didn't
interview everybody, so certain opinions or combinations of opinions won't
be represented in your data. When the population is very diverse, your
sample can't include all the possible combinations of attributes that are
found in the population, such blacks and whites, men and women, cardiac
patients non-patients, black women, white men, white women with heart
trouble who like Oprah and don't like Ally McBeal, etc.
Populations, Sampling Frames, and Elements
• Population is the universe of cases. It is the group that you ultimately want
to say something about. For example, if you want to report 'what Americans
think about Clinton', then the population is all Americans.
• Elements are the individual cases in the population (usually, persons)
• Sampling ratio is size of sample divided by size of population. Contrary to
popular belief, a large sampling ratio is not crucial.
• Sampling frame is a specific list of names from which sample elements will
be chosen. The Literary Digest poll in 1936 used a sample of 10 million,
drawn from government lists of automobile and telephone owners. Predicted
Alf Landon would beat Franklin Roosevelt by a wide margin. But instead
Roosevelt won by a landslide. The reason was that the sampling frame did
not match the population. Only the rich owned automobiles and telephones,
and they were the ones who favored Landon.
• Replacement. Sampling with replacement means that after you draw a name
out of the hat and record it, you put the name back and it can be chosen
again. Sampling without replacement means that once you draw the name
out, it is not available to be chosen again.
• Bias. Systematic errors produced by your sampling procedure. For example,
if you sample people and ask them whether they watch Ally McBeal, but the
percentage always comes out too high (maybe because you are interviewing
your friends and your whole group really likes Ally McBeal)
&
School of Law
Non-Probability Sampling
Haphazard/Convenience
• Whoever happens to walk by your office; who's on the street when the
camera crews come out
• If you have a choice, don't use this method. Often produces really wrong
answers, because certain attributes tend to cluster with certain geographic
and temporal variables. For example, at 8am in NYC, most of the people on
the street are workers heading for their jobs. At 10am, there are many more
people who don't work, and the proportion of women is much higher. At
midnight, there are young people and muggers.
Quota
• Haphazard sampling within categories (e.g., first 5 males to come by)

• Is an improvement, but still has problems. How do you know which
categories are key? How many do you get of each category?
Purposive/Judgement
• Expert judgement picks useful cases for study

• Good for exploratory, qualitative work, and for pre-testing a questionnaire.
Snowball
• Recruiting people based on recommendation of people you have just

interviewed
• Useful for studying invisible/illegal populations, such as drug addicts
Probability Sampling
Probability sampling is any sampling scheme in which the probability of choosing

each individual is the same (or at least known, so it can be readjusted
&
School of Law
mathematically). These are also called random sampling. They require more work,
but are much more accurate. They also allow the researcher to calculate the amount
of error she can expect, and this is really important.
Simple Random
• Develop a sampling frame, then randomly select elements (place all names
on cards, then randomly draw cards from hat; in Excel, there is a function
for attaching a random number to each cell, then sort and take N largest)
• Typically use sampling without replacement, but with replacement can be
done (and is easier mathematically)
• Any one sample is likely to yield statistics (such as the average income or
the percentage of respondents that watch Ally McBeal) that are different
from the population parameters
• The average statistic from many random samples should equal the
population parameter. In other words, if you took 150 different samples of
Americans, each of 300 people, and calculated the percentage that like Ally
McBeal in each of the samples, then averaged all those percentages together,
that should equal the "real" percentage of all Americans that like Ally
McBeal
• It is the Central Limit Theory that guarantees that as the number of random
samples increases, the average of those samples converges on the population
parameter
• Because of these mathematical guarantees, we can estimate how far off a
sample might be from the population, giving rise to confidence intervals
• Random samples are unbiased and, on average, representative of the
population.
Example. A company of 680 employees wants to know whether to bother with

instituting a program to deal with employee drug-taking. To find out, they will test
a sample of employees on an anonymous basis: if a person tests positive, the
company will not know who it is and will not try to find out. The objective is
solely to estimate what percentage of the company might be doing drugs. If the
&
School of Law
percentage is high enough, the company will consider instituting a mandatory drug
testing program. Given this objective, a simple random sampling design is perfect:
the results will generalize to the whole company.
Stratified Sampling
• Better than random sampling in terms of efficiency, but sometimes not

possible
• Procedure is this: Divide the population into strata (mutually exclusive
classes), such as men and women. Then randomly sample within strata.
• Suppose a population is 51% male and 49% female. To get a sample of 100
people, we randomly choose 51 males (from the population of all males)
and, separately, choose 49 females. Our sample is then guaranteed to have
exactly the correct proportion of sexes.
• This avoids problem of random sampling that the proportions could be 50-
50, 48-52, etc.
• Especially important when one group is so small (say, 3% of the population)
that a random sample might miss them entirely.
Example. The VP for Human Resources of a large manufacturing is considering

creating a stress-management program for employees. To get an idea of what kinds
of needs the program would have to fill, she will interview a sample of 50
employees first. If she does a simple random sample, it's possible that her sample
will not include any representatives of some of the smaller departments, just by
chance. Since she knows that different kinds of jobs within the company produce
different kinds of stress, she wants to get separate samples from the workmen (who
handle dangerous chemicals), the foremen (who balance the interests of the
workmen with management), and the managers (who are responsible to
shareholders). So she uses a stratified random sample.
Cluster Sampling
• Used when (a) sampling frame not available or too expensive, and (b) cost
of reaching an individual element is too high
&
School of Law
• E.g., there is no list of automobile mechanics in the US. Even if I could

construct it, it would cost too much money to reach randomly selected
mechanics across the entire US: would have to have unbelievable travel
budget
• In cluster sampling, first define large clusters of people. These clusters
should have a lot heterogeneity within, but be fairly similar to other clusters.
For example, cities make good clusters.
• Then sample among the clusters. Then once you have chosen the clusters,
randomly sample within the clusters.
• Clusters might be cities. Once you've chosen the cities, might be able to get
a reasonably accurate list of all the mechanics in each of those cities. Is also
much less expensive to fly to just 10 cities instead of 2000 cities.
• Cluster sampling is less expensive than other methods, but less accurate.
o each stage introduces its own sampling error.
• Suppose you want to sample college students. You start by sampling 300
colleges. Then choose 10 students from each college. Problem is, if the
colleges are of different size, the probability of a person being chosen if they
are from a big college is smaller than for a small college. So need to choose
a proportion of students, not a fixed number. Or don't choose colleges with
equal probability (let the big schools be more likely to be in the sample).
This is called PSS, Proportionate to Size Sampling
Example. Once a quarter, a large retail chain sends auditors to randomly chosen
stores to check that proper procedures are being carried out. They look at the
physical layout, the interactions between staff and customers, backroom
procedures, and so on. A simple random sample could have an auditor visiting a
California store one day, a New York the next, then another California store, and
so on. Using cluster sampling, the auditor might first select a random sample of
states, then visit a random sampling of stores with each state, thus reducing travel
time.
Sample Size
&
School of Law
• The bigger the better, up to 2500. Beyond 2500, it doesn't really matter
(accuracy increases very slowly after this point)
• The smaller the population, the bigger the sampling ratio that is needed.
• For populations under 1000, you need sampling ratio of 30% (300 elements)
to be really accurate.
• For populations of about 10,000 need sampling ratio of about 10%
• This lesson will show the difference between sampling and nonsampling
errors. Using a sample in order to get information about a population is often
better than conducting a census for many reasons.
• Sampling is less costly and it can be done more quickly than a census which
requires data for the entire population.
• However, as already stated, different samples selected from the same

population will give different results because these samples contain different
elements. Because of this discrepancy, we say that there is a sampling error.
• Sampling error definition

• Sampling error is the difference between the value of a sample statistics and
the value of the population parameter.
• Suppose, we need to find the sampling error for the mean. Suppose also
there is no nonsampling error which we define below.
• Let x̄ be the mean for a sample
• Let μ be the mean of the population.
• Sampling error = x̄ – μ
• For example, in the lesson about sampling distribution, the 5 scores below
are for the entire population and μ = 86.4
• 80 85 85 90 92
&
School of Law
• Suppose we choose a random sample of three scores from this population.

Assume that the scores are 85, 90, and 92
• x̄ = (85 + 90 + 92)/3 = 267 / 3 = 89
• Sampling error = x̄ – μ = 89 – 86.4 = 2.6
• The mean score estimated from the sample is 2.6 higher than the mean score
from the population.
• Nonsampling errors definition

• Any collection errors, recording errors, and/or tabulation errors are called
nonsampling errors.
• Notice that sampling errors occur because of chance. However, nonsampling

errors are the result of human mistakes.
• Now suppose when collecting the sample above, we mistakenly record 92 as

91.
• As a result, the sample mean is x̄ = (85 + 90 + 91)/3 = 266 / 3 = 88.66
• Consequently, the sampling error is now x̄ – μ = 89 – 88.66 = 0.34
• 0.34 does not really represent the sampling error since we already calculated
it as 2.6.
• The difference between 2.6 and 0.34 or 1.26 – 0.26 or 2.26 is the
nonsampling error because the value of 2.26 occured as a result of human
mistake.
• For the population and sample in this lessons,
• sampling error = 2.6
• nonsampling error = 2.26

&
School of Law
Reliability of samples
Sampling is an important aspect of social survey. It is the selection of relevant

units of inquiry for the collection of data and should be done in a scientific manner.
Therefore it is important to know how closely the measures based on sample
represent the parameters and how much variation one may expect if other samples
are analysed.The measures of reliability are concerned only with fluctuations due
to random sampling and they have nothing to do with observational and
computational errors. Whenever a measure of reliability is computed it is
understood that the sample is adequate and has been selected according to a
rigorously scientific procedure.
According to large sample theory the reliability of a measure such as the arithmetic
mean depends upon the number of cases in the sample and the variability of the
values in the sample. The reliability of a measure is related to the size of the
sample. The degree of variability of cases in a sample also has an important
influence on the reliability of the measures computed from the sample. If the cases
in the sample show a pronounced scatter a greater chance of fluctuation in the
measures would naturally be expected.
Brief explanation of the Central limit theorem
Central Limit Theorem is “The central limit theorem (CLT) is a statistical
theory that states that given a sufficiently large sample size from a population with
a finite level of variance, the mean of all samples from the same population will be
approximately equal to the mean of the population.
Unit-2
Probability Theory and Distributions

&
School of Law
In probability theory and statistics, a probability distribution is a mathematical
function that provides the probabilities of occurrence of different possible
outcomes in an experiment. ... The normal distribution is a commonly
encountered continuous probability distribution.
Multiplication Theorem on Probability

• SEPTEMBER 26, 2012
The probability of happening an event can easily be found using the definition of
probability. But just the definition cannot be used to find the probability of
happening of both the given events. A theorem known as “Multiplication
theorem” solves these types of problems. The statement and proof of
“Multiplication theorem” and its usage in various cases is as follows.
Multiplication theorem on probability:
If A and B are any two events of a sample space such that P(A) ≠0 and P(B)≠0,
then
P(A∩B) = P(A) * P(B|A) = P(B) *P(A|B).
Example: If P(A) = 1/5 P(B|A) = 1/3 then what is P(A∩B)?
Solution: P(A∩B) = P(A) * P(B|A) = 1/5 * 1/3 = 1/15

&
School of Law
INDEPENDENT EVENTS:
Two events A and B are said to be independent if there is no change in the
happening of an event with the happening of the other event.
i.e. Two events A and B are said to be independent if
P(A|B) = P(A) where P(B)≠0.
P(B|A) = P(B) where P(A)≠0.
i.e. Two events A and B are said to be independent if

P(A∩B) = P(A) * P(B).
Example:
While laying the pack of cards, let A be the event of drawing a diamond and B
be the event of drawing an ace.
Then P(A) = 13/52 = 1/4 and P(B) = 4/52=1/13
Now, A∩B = drawing a king card from hearts.
Then P(A∩B) = 1/52
Now, P(A/B) = P(A∩B)/P(B) = (1/52)/(1/13) = 1/4 = P(A).
So, A and B are independent.
[Here, P(A∩B) = = = P(A) * P(B)]

&
School of Law
Note:
(1) If 3 events A,B and C are independent the
P(A∩B∩C) = P(A)*P(B)*P(C).
(2) If A and B are any two events, then P(AUB) = 1-P(A’)P(B’).
Addition Theorem on Probability

• SEPTEMBER 19, 2012
The probability of happening an event can easily be found using the definition of
probability. But just the definition cannot be used to find the probability of
happening at least one of the given events. A theorem known as “Addition
theorem” solves these types of problems. The statement and proof of “Addition
theorem” and its usage in various cases is as follows.
Mutually exclusive events:

Two or more events are said to be mutually exclusive if they don’t have any
element in common. i.e. if, the occurrence of one of the events prevents the
occurrence of the others then those events are said to be mutually exclusive.
Example:
The event of getting 2 heads, A and the event of getting 2 tails, B when two
coins are tossed are mutually exclusive.
Because A = {HH}; B = {TT}.
Mutually exhaustive events:

&
School of Law
Two events are said to be mutually exhaustive if there is a certainty of occurring

at least one of those two events. i.e. one of those events will definitely happen.
If A and B are two mutually exhaustive then the probability of their union is 1.
i.e. P(AUB)=1.
Example:
The event of getting a head and the event of getting a tail when a coin is tossed
are mutually exhaustive.
Addition theorem on probability:

If A and B are any two events then the probability of happening of at least one of
the events is defined as P(AUB) = P(A) + P(B)- P(A∩B).
Proof:
Since events are nothing but sets,
From set theory, we have
n(AUB) = n(A) + n(B)- n(A∩B).
Dividing the above equation by n(S), (where S is the sample space)

n(AUB)/ n(S) = n(A)/ n(S) + n(B)/ n(S)- n(A∩B)/ n(S)
Then by the definition of probability,
P(AUB) = P(A) + P(B)- P(A∩B).

&
School of Law
Example:
If the probability of solving a problem by two students George and James are 1/2
and 1/3 respectively then what is the probability of the problem to be solved.
Solution:
Let A and B be the probabilities of solving the problem by George and James
respectively.
Then P(A)=1/2 and P(B)=1/3.
The problem will be solved if it is solved at least by one of them also.
So, we need to find P(AUB).
By addition theorem on probability, we have
P(AUB) = P(A) + P(B)- P(A∩B).
P(AUB) = 1/2 +.1/3 – 1/2 * 1/3 = 1/2 +1/3-1/6 = (3+2-1)/6 = 4/6 = 2/3
Note:
If A and B are any two mutually exclusive events then P(A∩B)=0.
Then P(AUB) = P(A)+P(B).
Conditional probability & independent events

&
School of Law
Conditional Probability
The conditional probability of an event B is the probability that the event will
occur given the knowledge that an event A has already occurred. This probability is
written P(B|A), notation for the probability of B given A. In the case where
events A and B are independent (where event A has no effect on the probability of
event B), the conditional probability of event B given event A is simply the
probability of event B, that is P(B).
If events A and B are not independent, then the probability of the intersection
of A and B (the probability that both events occur) is defined by
P(A and B) = P(A)P(B|A).
From this definition, the conditional probability P(B|A) is easily obtained by

dividing by P(A):
Note: This expression is only valid when P(A) is greater than 0.
Examples
In a card game, suppose a player needs to draw two cards of the same suit in order
to win. Of the 52 cards, there are 13 cards in each suit. Suppose first the player
draws a heart. Now the player wishes to draw a second heart. Since one heart has
already been chosen, there are now 12 hearts remaining in a deck of 51 cards. So
the conditional probability P(Draw second heart|First card a heart) = 12/51.
&
School of Law
Suppose an individual applying to a college determines that he has an 80% chance

of being accepted, and he knows that dormitory housing will only be provided for
60% of all of the accepted students. The chance of the student being
accepted and receiving dormitory housing is defined by
P(Accepted and Dormitory Housing) = P(Dormitory
Housing|Accepted)P(Accepted) = (0.60)*(0.80) = 0.48.
To calculate the probability of the intersection of more than two events, the
conditional probabilities of all of the preceding events must be considered. In
the case of three events, A, B, and C, the probability of the intersection P(A
and B and C) = P(A)P(B|A)P(C|A and B).
Consider the college applicant who has determined that he has 0.80 probability of
acceptance and that only 60% of the accepted students will receive dormitory
housing. Of the accepted students who receive dormitory housing, 80% will have
at least one roommate. The probability of being accepted and receiving dormitory
housing and having no roommates is calculated by:
P(Accepted and Dormitory Housing and No Roommates) =
P(Accepted)P(Dormitory Housing|Accepted)P(No Roomates|Dormitory Housing
and Accepted) = (0.80)*(0.60)*(0.20) = 0.096. The student has about a 10% chance
of receiving a single room at the college.
Another important method for calculating conditional probabilities is given

by Bayes's formula. The formula is based on the expression P(B) = P(B|A)P(A) +
P(B|Ac)P(Ac), which simply states that the probability of event B is the sum of the
conditional probabilities of event B given that event A has or has not occurred. For
independent events A and B, this is equal to P(B)P(A) + P(B)P(Ac) = P(B)(P(A) +
P(Ac)) = P(B)(1) = P(B), since the probability of an event and its complement
must always sum to 1. Bayes's formula is defined as follows:
&
School of Law
Example
Suppose a voter poll is taken in three states. In state A, 50% of voters support the
liberal candidate, in state B, 60% of the voters support the liberal candidate, and in
state C, 35% of the voters support the liberal candidate. Of the total population of
the three states, 40% live in state A, 25% live in state B, and 35% live in state C.
Given that a voter supports the liberal candidate, what is the probability that she
lives in state B?
By Bayes's formula,
P(Voter lives in state B|Voter supports liberal candidate) =

P(Voter supports liberal candidate|Voter lives in state B)P(Voter lives in state B)/
(P(Voter supports lib. cand.|Voter lives in state A)P(Voter lives in state A)
+
P(Voter supports lib. cand.|Voter lives in state B)P(Voter lives in state B)
+
P(Voter supports lib. cand.|Voter lives in state C)P(Voter lives in state C))
= (0.60)*(0.25)/((0.50)*(0.40) + (0.60)*(0.25) + (0.35)*(0.35))
= (0.15)/(0.20 + 0.15 + 0.1225) = 0.15/0.4725 = 0.3175.
The probability that the voter lives in state B is approximately 0.32.
Independent Events
&
School of Law
LO 6.7: Determine whether two events are independent or dependent and justify
your conclusion.
We begin with a verbal definition of independent events (later we will use

probability notation to define this more precisely).
Independent Events:
• Two events A and B are said to be independent if the fact that one event has
occurred does not affect the probability that the other event will occur.
• If whether or not one event occurs does affect the probability that the other
event will occur, then the two events are said to be dependent.
Here are a few examples:
EXAMPLE:
A woman’s pocket contains two quarters and two nickels.
She randomly extracts one of the coins and, after looking at it, replaces it before
picking a second coin.
Let Q1 be the event that the first coin is a quarter and Q2 be the event that the
second coin is a quarter.
Are Q1 and Q2 independent events?

• Why?
Since the first coin that was selected is replaced, whether or not Q1 occurred (i.e.,
whether the first coin was a quarter) has no effect on the probability that the second
coin will be a quarter, P(Q2).
In either case (whether Q1 occurred or not), when she is selecting the second coin,
she has in her pocket:
&
School of Law
and therefore the P(Q2) = 2/4 = 1/2 regardless of whether Q1 occurred.

EXAMPLE:
A woman’s pocket contains two quarters and two nickels.
She randomly extracts one of the coins, and without placing it back into her
pocket, she picks a second coin.
As before, let Q1 be the event that the first coin is a quarter, and Q2 be the event
that the second coin is a quarter.
Are Q1 and Q2 independent events?

• Q1 and Q2 are not independent. They are dependent. Why?
Since the first coin that was selected is not replaced, whether Q1 occurred (i.e.,
whether the first coin was a quarter) does affect the probability that the second
coin is a quarter, P(Q2).
If Q1 occurred (i.e., the first coin was a quarter), then when the woman is
selecting the second coin, she has in her pocket:
• In this case, P(Q2) = 1/3.

However, if Q1 has not occurred (i.e., the first coin was not a quarter, but a
nickel), then when the woman is selecting the second coin, she has in her pocket:
• In this case, P(Q2) = 2/3.

&
School of Law
In these last two examples, we could actually have done some calculation in order
to check whether or not the two events are independent or not.
Sometimes we can just use common sense to guide us as to whether two events are
independent. Here is an example.
EXAMPLE:
Two people are selected simultaneously and at random from all people in the
United States.
Let B1 be the event that one of the people has blue eyes and B2 be the event that
the other person has blue eyes.
In this case, since they were chosen at random, whether one of them has blue eyes
has no effect on the likelihood that the other one has blue eyes, and therefore B1
and B2 are independent.
On the other hand …
EXAMPLE:
A family has 4 children, two of whom are selected at random.
Let B1 be the event that one child has blue eyes, and B2 be the event that the other
chosen child has blue eyes.
In this case, B1 and B2 are not independent, since we know that eye color is
hereditary.
Thus, whether or not one child is blue-eyed will increase or decrease the chances
that the other child has blue eyes, respectively.
Comments:
• It is quite common for students to initially get confused about the distinction
between the idea of disjoint events and the idea of independent events. The
&
School of Law
purpose of this comment (and the activity that follows it) is to help students
develop more understanding about these very different ideas.
The idea of disjoint events is about whether or not it is possible for the events to
occur at the same time (see the examples on the page for Basic Probability Rules).
The idea of independent events is about whether or not the events affect each
other in the sense that the occurrence of one event affects the probability of the
occurrence of the other (see the examples above).
The following activity deals with the distinction between these concepts.
The purpose of this activity is to help you strengthen your understanding about the
concepts of disjoint events and independent events, and the distinction between
them.
Learn by Doing: Independent Events
Let’s summarize the three parts of the activity:
• In Example 1: A and B are not disjoint and independent

• In Example 2: A and B are not disjoint and not independent
• In Example 3: A and B are disjoint and not independent.
Why did we leave out the case when the events are disjoint and independent?
The reason is that this case DOES NOT EXIST!
A and B A and B Not

Independent Independent
A and B Disjoint DOES NOT EXIST Example 3
A and B Not Example 1 Example 2

Disjoint
If events are disjoint then they must be not independent, i.e. they must be
dependent events.
&
School of Law
Why is that?
• Recall: If A and B are disjoint then they cannot happen together.
• In other words, A and B being disjoint events implies that if event A occurs
then B does not occur and vice versa.
• Well… if that’s the case, knowing that event A has occurred dramatically
changes the likelihood that event B occurs – that likelihood is zero.
• This implies that A and B are not independent.
Now that we understand the idea of independent events, we can finally get to rules
for finding P(A and B) in the special case in which the events A and B are
independent.
Later we will present a more general version for use when the events are not
necessarily independent.
Multiplication Rule for Independent Events (Rule Six)

LO 6.8: Apply the multiplication rule for independent events to calculate P(A and
B) for independent events.
We now turn to rules for calculating
• P(A and B) = P(both event A occurs and event B occurs)

beginning with the multiplication rule for independent events.
Using a Venn diagram, we can visualize “A and B,” which is represented by the
overlap between events A and B:
&
School of Law
Probability Rule Six (The Multiplication Rule for Independent Events):

• If A and B are two INDEPENDENT events, then P(A and B) = P(A) * P(B).
Comment:
• When dealing with probability rules, the word “and” will always be associated
with the operation of multiplication; hence the name of this rule, “The
Multiplication Rule.”
•
EXAMPLE:
Recall the blood type example:
Two people are selected simultaneously and at random from all people in the
United States.
What is the probability that both have blood type O?

• Let O1= “person 1 has blood type O” and
• O2= “person 2 has blood type O”
&
School of Law
We need to find P(O1 and O2)

Since they were chosen simultaneously and at random, the blood type of one has
no effect on the blood type of the other. Therefore, O1 and O2 are independent,
and we may apply Rule 6:
• P(O1 and O2) = P(O1) * P(O2) = 0.44 * 0.44 = 0.1936.

Did I Get This?: Probability Rule Six
Comments:
• We now have an Addition Rule that says
P(A or B) = P(A) + P(B) for disjoint events,
and a Multiplication Rule that says
P(A and B) = P(A) * P(B) for independent events.

The purpose of this comment is to point out the magnitude of P(A or B) and of P(A
and B) relative to either one of the individual probabilities.
Since probabilities are never negative, the probability of one event or another is
always at least as large as either of the individual probabilities.
Since probabilities are never more than 1, the probability of one event and another
generally involves multiplying numbers that are less than 1, therefore can never be
more than either of the individual probabilities.
Here is an example:
EXAMPLE:
Consider the event A that a randomly chosen person has blood type A.
Modify it to a more general event — that a randomly chosen person has blood type
A or B — and the probability increases.
Modify it to a more specific (or restrictive) event — that not just one randomly
chosen person has blood type A, but that out of two simultaneously randomly
&
School of Law
chosen people, person 1 will have type A and person 2 will have type B — and the
probability decreases.
It is important to mention this in order to root out a common misconception.
• The word “and” is associated in our minds with “adding more stuff.” Therefore,
some students incorrectlythink that P(A and B) should be larger than either one
of the individual probabilities, while it is actually smaller, since it is a more
specific (restrictive) event.
• Also, the word “or” is associated in our minds with “having to choose between”
or “losing something,” and therefore some students incorrectly think that P(A
or B) should be smaller than either one of the individual probabilities, while it is
actually larger, since it is a more general event.
Practically, you can use this comment to check yourself when solving problems.
For example, if you solve a problem that involves “or,” and the resulting
probability is smaller than either one of the individual probabilities, then you know
you have made a mistake somewhere.
Did I Get This?: Comparing P(A and B) to P(A or B)
Comment:
• Probability rule six can be used as a test to see if two events are independent or
not.
• If you can easily find P(A), P(B), and P(A and B) using logic or are provided
these values, then we can test for independent events using the multiplication
rule for independent events:
IF P(A)*P(B) = P(A and B) THEN A and B are independent events,
otherwise, they are dependent events.
As you’ve seen, the last three rules that we’ve introduced (the Complement Rule,
the Addition Rules, and the Multiplication Rule for Independent Events) are
frequently used in solving problems.
&
School of Law
Before we move on to our next rule, here are two comments that will help you use
these rules in broader types of problems and more effectively.
Comment:
• As we mentioned before, the Addition Rule for Disjoint events (rule four) can
be extended to more than two disjoint events.
• Likewise, the Multiplication Rule for independent events (rule six) can be
extended to more than two independent events.
• So if A, B and C are three independent events, for example, then P(A and B and
C) = P(A) * P(B) * P(C).
• These extensions are quite straightforward, as long as you remember that “or”
requires us to add, while “and” requires us to multiply.
EXAMPLE:
Three people are chosen simultaneously and at random.
What is the probability that all three have blood type B?
We’ll use the usual notation of B1, B2 and B3 for the events that persons 1, 2 and
3 have blood type B, respectively.
We need to find P(B1 and B2 and B3). Let’s solve this one together:
Learn by Doing: Extending Probability Rule Six
Here is another example that might be quite surprising.
EXAMPLE:
A fair coin is tossed 10 times. Which of the following two outcomes is more
likely?
(a) HHHHHHHHHH
&
School of Law
(b) HTTHHTHTTH
Learn by Doing: A Surprising Result using Probability Rule Six?
In fact, they are equally likely. The 10 tosses are independent, so we’ll use the
Multiplication Rule for Independent Events:
• P(HHHHHHHHHH) = P(H) * P(H) * … *P(H) = 1/2 * 1/2 *… * 1/2 =

(1/2)10
• P(HTTHHTHTTH) = P(H) * P(T) * … * P(H) = 1/2 * 1/2 *… * 1/2 = (1/2)10
Here is the idea:
Our random experiment here is tossing a coin 10 times.
• You can imagine how huge the sample space is.

• There are actually 1,024 possible outcomes to this experiment, all of which are
equally likely.
Therefore,
• while it is true that it is more likely to get an outcome that has 5 heads and 5
tails than an outcome that has only heads
since there is only one possible outcome which gives all heads
and many possible outcomes which give 5 heads and 5 tails
• if we are comparing 2 specific outcomesas we do here, they are equally likely.
IMPORTANT Comments:
• Only use the multiplication rule for independent events, rule six, which says
P(A and B) = P(A)P(B) if you are certain the two events are independent.
o Probability rule six is ONLY true for independent events.
• When finding P(A or B) using the general addition rule: P(A) + P(B) – P(A
and B),
o do NOT use the multiplication rule for independent events to calculate P(A
and B), use only logic and counting.
&
School of Law
Bayes’ theorem
What is Bayes’ Theorem?

Bayes’ theorem is a way to figure out conditional probability. Conditional
probability is the probability of an event happening, given that it has some
relationship to one or more other events. For example, your probability of getting a
parking space is connected to the time of day you park, where you park, and what
conventions are going on at any time. Bayes’ theorem is slightly more nuanced. In
a nutshell, it gives you the actual probability of an event given information
about tests.
• “Events” Are different from “tests.” For example, there is a test for liver
disease, but that’s separate from the event of actually having liver disease.
• Tests are flawed: just because you have a positive test does not mean you
actually have the disease. Many tests have a high false positive rate. Rare
events tend to have higher false positive rates than more common events.
We’re not just talking about medical tests here. For example, spam filtering can
have high false positive rates. Bayes’ theorem takes the test results and
calculates your real probability that the test has identified the event.
Bayes’ Theorem (also known as Bayes’ rule) is a deceptively simple formula used
to calculate conditional probability. The Theorem was named after English
mathematician Thomas Bayes (1701-1761). The formal definition for the rule is:
In most cases, you can’t just plug numbers into an equation; You have to figure out
what your “tests” and “events” are first. For two events, A and B, Bayes’ theorem
allows you to figure out p(A|B) (the probability that event A happened, given that
test B was positive) from p(B|A) (the probability that test B happened, given that
event A happened). It can be a little tricky to wrap your head around as technically
you’re working backwards; you may have to switch your tests and events around,
&
School of Law
which can get confusing. An example should clarify what I mean by “switch the
tests and events around.”
Bayes’ Theorem Example #1
You might be interested in finding out a patient’s probability of having liver
disease if they are an alcoholic. “Being an alcoholic” is the test (kind of like a
litmus test) for liver disease.
• A could mean the event “Patient has liver disease.” Past data tells you that 10%
of patients entering your clinic have liver disease. P(A) = 0.10.
• B could mean the litmus test that “Patient is an alcoholic.” Five percent of the
clinic’s patients are alcoholics. P(B) = 0.05.
• You might also know that among those patients diagnosed with liver disease,
7% are alcoholics. This is your B|A: the probability that a patient is alcoholic,
given that they have liver disease, is 7%.
Bayes’ theorem tells you:
P(A|B) = (0.07 * 0.1)/0.05 = 0.14
In other words, if the patient is an alcoholic, their chances of having liver disease is
0.14 (14%). This is a large increase from the 10% suggested by past data. But it’s
still unlikely that any particular patient has liver disease.
More Bayes’ Theorem Examples
Bayes’ Theorem Problems Example #2
Another way to look at the theorem is to say that one event follows another. Above
I said “tests” and “events”, but it’s also legitimate to think of it as the “first event”
that leads to the “second event.” There’s no one right way to do this: use the
terminology that makes most sense to you.
In a particular pain clinic, 10% of patients are prescribed narcotic pain killers.
Overall, five percent of the clinic’s patients are addicted to narcotics (including
pain killers and illegal substances). Out of all the people prescribed pain pills, 8%
are addicts. If a patient is an addict, what is the probability that they will be
prescribed pain pills?
Step 1: Figure out what your event “A” is from the question. That information
is in the italicized part of this particular question. The event that happens first (A)
is being prescribed pain pills. That’s given as 10%.
&
School of Law
Step 2: Figure out what your event “B” is from the question. That information
is also in the italicized part of this particular question. Event B is being an addict.
That’s given as 5%.
Step 3: Figure out what the probability of event B (Step 2) given event A (Step
1). In other words, find what (B|A) is. We want to know “Given that people are
prescribed pain pills, what’s the probability they are an addict?” That is given in
the question as 8%, or .8.
Step 4: Insert your answers from Steps 1, 2 and 3 into the formula and solve.
P(A|B) = P(B|A) * P(A) / P(B) = (0.08 * 0.1)/0.05 = 0.16
The probability of an addict being prescribed pain pills is 0.16 (16%).
Expected Values
The expected value is also known as the expectation, mathematical expectation,
EV, average, mean value, mean, or first moment. More practically, the expected
value of a discrete random variable is the probability-weighted average of all
possible values.
Binomial Distribution: Formula, What it is and How to use it

Contents:
1. What is a Binomial Distribution?
2. The Bernoulli Distribution
3. The Binomial Distribution Formula
4. Worked Examples
What is a Binomial Distribution?
A binomial distribution can be thought of as simply the probability of a
SUCCESS or FAILURE outcome in an experiment or survey that is repeated
multiple times. The binomial is a type of distribution that has two possible
outcomes (the prefix “bi” means two, or twice). For example, a coin toss has only
&
School of Law
two possible outcomes: heads or tails and taking a test could have two possible
outcomes: pass or fail.
A Binomial Distribution shows either (S)uccess or (F)ailure.
The first variable in the binomial formula, n, stands for the number of times the
experiment runs. The second variable, p, represents the probability of one specific
outcome. For example, let’s suppose you wanted to know the probability of getting
a 1 on a die roll. if you were to roll a die 20 times, the probability of rolling a one
on any throw is 1/6. Roll twenty times and you have a binomial distribution of
(n=20, p=1/6). SUCCESS would be “roll a one” and FAILURE would be “roll
anything else.” If the outcome in question was the probability of the die landing on
an even number, the binomial distribution would then become (n=20, p=1/2).
That’s because your probability of throwing an even number is one half.
Criteria
Binomial distributions must also meet the following three criteria:
1. The number of observations or trials is fixed. In other words, you can only
figure out the probability of something happening if you do it a certain number
of times. This is common sense—if you toss a coin once, your probability of
&
School of Law
getting a tails is 50%. If you toss a coin a 20 times, your probability of getting a
tails is very, very close to 100%.
2. Each observation or trial is independent. In other words, none of your trials
have an effect on the probability of the next trial.
3. The probability of success (tails, heads, fail or pass) is exactly the same from
one trial to another.
The binomial distribution formula is:

b(x; n, P) = nCx * Px * (1 – P)n – x
Where:
b = binomial probability
x = total number of “successes” (pass or fail, heads or tails etc.)
P = probability of a success on an individual trial
n = number of trials
Note: The binomial distribution formula can also be written in a slightly

different way, because nCx = n!/x!(n-x)! (this binomial distribution formula
uses factorials (What is a factorial?). “q” in this formula is just the probability
of failure (subtract your probability of success from 1).
Using the First Binomial Distribution Formula

The binomial distribution formula can calculate the probability of success for
binomial distributions. Often you’ll be told to “plug in” the numbers to
the formula and calculate. This is easy to say, but not so easy to do—unless you
are very careful with order of operations, you won’t get the right answer. If you
have a Ti-83 or Ti-89, the calculator can do much of the work for you. If not,
here’s how to break down the problem into simple steps so you get the answer
right—every time.
Example 1
Q. A coin is tossed 10 times. What is the probability of getting exactly 6
heads?
&
School of Law
I’m going to use this formula: b(x; n, P) – nCx * Px * (1 – P)n – x

The number of trials (n) is 10
The odds of success (“tossing a heads”) is 0.5 (So 1-p = 0.5)
x=6
P(x=6) = 10C6 * 0.5^6 * 0.5^4 = 210 * 0.015625 * 0.0625 = 0.205078125
Tip: You can use the combinations calculator to figure out the value for nCx.
How to Work a Binomial Distribution Formula: Example 2
80% of people who purchase pet insurance are women. If 9 pet insurance
owners are randomly selected, find the probability that exactly 6 are women.
Step 1: Identify ‘n’ from the problem. Using our sample question, n (the number
of randomly selected items) is 9.
Step 2: Identify ‘X’ from the problem. X (the number you are asked to find the
probability for) is 6.
Step 3: Work the first part of the formula. The first part of the formula is
n! / (n – X)! X!
Substitute your variables:
9! / ((9 – 6)! × 6!)
Which equals 84. Set this number aside for a moment.
Step 4: Find p and q. p is the probability of success and q is the probability of
failure. We are given p = 80%, or .8. So the probability of failure is 1 – .8 = .2
(20%).
Step 5: Work the second part of the formula.
pX
= .86
= .262144
Set this number aside for a moment.
Step 6: Work the third part of the formula.
&
School of Law
q(n – X)
= .2(9-6)
= .23
= .008
Step 7: Multiply your answer from step 3, 5, and 6 together.
84 × .262144 × .008 = 0.176.
Example 3
60% of people who purchase sports cars are men. If 10 sports car owners are
randomly selected, find the probability that exactly 7 are men.
Step 1:: Identify ‘n’ and ‘X’ from the problem. Using our sample question, n (the
number of randomly selected items—in this case, sports car owners are randomly
selected) is 10, and X (the number you are asked to “find the probability” for) is
7.
Step 2: Figure out the first part of the formula, which is:
n! / (n – X)! X!
Substituting the variables:
10! / ((10 – 7)! × 7!)
Which equals 120. Set this number aside for a moment.
Step 3: Find “p” the probability of success and “q” the probability of failure. We
are given p = 60%, or .6. therefore, the probability of failure is 1 – .6 = .4 (40%).
Step 4: Work the next part of the formula.
pX
= .67
= .0.0279936
Set this number aside while you work the third part of the formula.
Step 5: Work the third part of the formula.
q(.4 – 7)
= .4(10-7)
&
School of Law
= .43
= .0.064
Step 6: Multiply the three answers from steps 2, 4 and 5 together.
120 × 0.0279936 × 0.064 = 0.215.
Poisson Distribution
A Poisson distribution is the probability of the number of events that occur in a

given interval when the expected number of events is known and the events occur
independently of one another. The symbol μ denotes the expected number of
events that occur during the interval. The probability that there are
exactly x occurrences in the interval can be determined by the
formula , where e is a constant equal to approximately
2.71828 (the base of the natural logarithm system), μ is the expected number of
events that occur in the interval, x is the actual number of events that occur in the
interval, and x! is the factorial of x.
Applications of Normal Distribution
Reasoning based on normal distributions is an important skill that goes throughout

the rest of the course. In this lecture, we will look at a few problems that illustrate
what you can do with normal distributions. One of the variables that we know do
follow normal distributions is the height of people. For all these problems, we’re
going to assume that women’s heights are normally distributed with a mean of 65
inches and a standard deviation of 3 inches. In the textbook’s notation, we can also
state .
Finding Probabilities from Given Values of Random Variable

&
School of Law
1) What is the probability that a woman is between 64 inches and 69 inches

tall (5’4” to 5’9”)? Put another way, what fraction of women’s heights are in
this range? Using the notation of random variables, we would write this as
P(64 < X < 69).
First, draw a horizontal axis and label it x, write the units (inches) below it, and
draw a normal pdf centered over the mean of 65 inches. Then mark and label 65 on
the axis, mark and label 64 to the left of it and 69 to the right of it, draw vertical
lines from the 64 and the 69 to the curve and shade the part between them, above
the x-axis, and under the curve:
If you are using GeoGebra, then you will immediately see that the software tells
you P(64 < X <69) =0.5393. If you are using the calculator, then you need to find
the normalcdf (not normalpdf) function. Enter the number on the left where the
shading begins, the number on the right where it ends, the mean of the distribution,
and its standard deviation, all separated by commas, normalcdf (64, 69, 65, 3), and
you will get 0.539347. Round this to the nearest ten-thousandth (four places after
the decimal point), or equivalently to the nearest hundredth of a percent, and you
come up with the correct answer: 0.5393, or 53.93%.
In the last lecture, we mentioned that in the old days, everyone has to learn how to
look up a Z-table, the table the shows the relationship between area and Z-score for
the standard normal. Then how does GeoGebra and normalcdf do it? Well, it’s no
&
School of Law
magic. The software simply converts any normal distribution to a standard normal,
using the familiar relationship of Z-score:
So our example above will be converted to:
, which gives you exactly

the same area, just under a different scale:
It’s not necessary that you always convert all normal distributions to Z, but it’s
useful to recognize how it is handled by the software, since we will be doing the
same later in inferential statistics.
2) What is the probability that a woman is taller than 5 feet, 10 inches, or 70
inches? Put another way, what fraction of women are taller than 70 inches?
This would be written as P(X > 70).
Start the same way as in Problem 1, but you have to mark and label only one
number besides the mean, the 70. Then shade to the right of the 70, because that’s
where the taller heights are:
&
School of Law
GeoGebra is fairly self-explanatory here. With the calculator, the only

complication using normalcdf is that there is no number on the right where the
shading ends, so put in a big one, and if you’re not sure if it’s big enough put in a
bigger one and see if it changes your answer, at least to the nearest ten-thousandth.
normalcdf ( 70, 1000, 65, 3)=0.04779, so the rounded answer is 0.0478, or 4.78%.
Find Cut-Off Values of the Random Variable from Probability
In the problems above, we found the probability that the random variable falls
within a certain range. Now we’re going to reverse the process. We’ll start with the
probability of a certain range, and then we’ll have to find the values of the random
variable that determine that range. I’ll call these values cut-offs. Sometimes they
are also called “inverse probability” problems.
In these three problems, we’ll use the same situation as above: Women’s heights
are normally distributed with a mean of 65 inches and a standard deviation of 3
inches.
1) How short does a woman have to be to be in the shortest 10% of women?
If we call this cut-off c, this could be written as finding c such that P(X < c)
= 0.10.
&
School of Law
We’ll do the same kind of diagram as before, but this time we’ll label the known
probability, 10%, and we do this above the shaded area, definitely not on the x-
axis, because it’s an area, not a height. The hardest part of the diagram is deciding
which side of the mean to put the c on and which side of the c to shade.
You really have to think about it. In this case, since by definition 50% of women
are shorter than the mean, the cut-off for 10% has to be less than the mean.
The picture here shows that how GeoGebra can be used to find the cut-off values:
instead of entering the cut-off values, you can enter 0.10 as the probability, and
GeoGebra will solve for the cut-off value (61.1553).
Using the calculator, you will need to resort to the invNorm function, followed by
the percent of data under the normal curve to the left of (always to the left of, no
matter which side of c the shading is on) the cut-off, then the mean and standard
deviation, separated by commas.
&
School of Law
So in our example, we will do invNorm (0.10, 65, 3), or, to the nearest inch, like
the mean and standard deviation, 61 inches. So about 10% of women are shorter
than 61 inches. You can check this using normalcdf, and you might as well use
more of the cut-off than we rounded to, for greater assurance that your check
shows you got the right answer. You get normalcdf (0, 61.1553, 65, 3), which
come to 0.0999997, or 10%.
2) How tall does a woman have to be to be in the tallest fourth of women?
(What is the cut-off for the tallest 25% of women?) If we call this height c,
we want to find the value of c such that P(X > c) = 0.25. Here’s the diagram:
In GeoGebra it’s quite simple: you will just have to switch the left to the right tail.
In the calculator, when we use invNorm we must put in 0.75, because the
calculator finds cut-offs for areas to the left only: invNorm (0.75, 65, 3). Here 0.75
comes from the fact that the total area must be equal to 1. When we subtract the
area to the right, we are getting the area to the left of the cut-off.
&
School of Law
Again, either GeoGebra or invNorm rely on the standard normal Z table to

compute these values. To see how this is done, you will first need to first the cut-
off value for the 25% area to the right:
$P(Z > 0.67) = 0.25$
Then using the relationship between the Z score and X, we can solve for x as the
unknown:
Using the algebra you have learned, you will find x = 3*0.67 + 65 = 67.0, which is
how the software arrived at the answer. You won’t have to do it this way every
time, but it’s helpful to keep in mind, since this relation is used later on in finding
the margin of error for confidence intervals.
3) What if we’re interested in finding cut-offs for a middle group of
women’s heights, say the middle 40%? Obviously, we’re looking for two
numbers here, one on either side of the mean, with the same distance to the
mean. Call them and . Then we are looking for these values so
that
&
School of Law
You probably noticed that the normal calculator in GeoGebra can’t really find two
cut-offs at once in fact, the figure above was drawn using a different tool. But
and are not two independent values, since they are equally far from 65, the
mean. To use the normal calculator, we must find out how much area is under the
curve to the left of . Well, if 100% of area is under the entire curve, then what’s
left over after taking away the middle 40% is 1-0.40=0.60, and since that 60% is
split evenly between the two tails (the parts at the sides), that gives 30% for each
tail. So is the number such that .
&
School of Law
So , the cut-off value on the left, is 63.4 inches.
How much area is there under the curve to the left of ? Either subtract the 30% to
the right from 100%, or add up the 30% in the left tail and the 40% in the middle,
and you’ll get 70% either way. So is the number such that , and
you will find that inches. So to the first decimal, the middle 40% of
heights go from 63.4 to 66.6 inches. If you use invNorm on a calculator, the
process will be similar.
Summary
Here are a few tips that may help you solve problems related to the normal
distribution:
1) First identify the distribution: is it continuous? Is it Normal?
&
School of Law
2) Draw a graph of the normal PDF with the mean and standard deviation
3) Examine the question to see whether you are looking for a probability, or
cut-off values.
4) Shade the approximate areas under the normal PDF.
5) Use the software/calculator to solve the unknown, and compare the output
with your graph.
Unit-3
Hypothesis Testing & Analysis of Variance
Gather sample data and calculate a test statistic where the sample statistic is
compared to the parameter value. The test statistic is calculated under the
assumption the null hypothesis is true, and incorporates a measure of standard
error and assumptions (conditions) related to the sampling distribution.
Level of Significance
What do significance levels and P values mean in hypothesis tests?

What is statistical significance anyway? In this post, I’ll continue to focus on
concepts and graphs to help you gain a more intuitive understanding of how
hypothesis tests work in statistics.
To bring it to life, I’ll add the significance level and P value to the graph in my
previous post in order to perform a graphical version of the 1 sample t-test. It’s
easier to understand when you can see what statistical significance truly means!
&
School of Law
Here’s where we left off in my last post. We want to determine whether our sample
mean (330.6) indicates that this year's average energy cost is significantly different
from last year’s average energy cost of $260.
The probability distribution plot above shows the distribution of sample

means we’d obtain under the assumption that the null hypothesis is true
(population mean = 260) and we repeatedly drew a large number of random
samples.
&
School of Law
I left you with a question: where do we draw the line for statistical significance on
the graph? Now we'll add in the significance level and the P value, which are the
decision-making tools we'll need.
We'll use these tools to test the following hypotheses:
• Null hypothesis: The population mean equals the hypothesized mean (260).
• Alternative hypothesis: The population mean differs from the hypothesized
mean (260).
What Is the Significance Level (Alpha)?
The significance level, also denoted as alpha or α, is the probability of rejecting the
null hypothesis when it is true. For example, a significance level of 0.05 indicates a
5% risk of concluding that a difference exists when there is no actual difference.
These types of definitions can be hard to understand because of their technical

nature. A picture makes the concepts much easier to comprehend!
The significance level determines how far out from the null hypothesis value we'll
draw that line on the graph. To graph a significance level of 0.05, we need to shade
the 5% of the distribution that is furthest away from the null hypothesis.
&
School of Law
In the graph above, the two shaded areas are equidistant from the null hypothesis
value and each area has a probability of 0.025, for a total of 0.05. In statistics, we
call these shaded areas the critical region for a two-tailed test. If the population
mean is 260, we’d expect to obtain a sample mean that falls in the critical region
5% of the time. The critical region defines how far away our sample statistic must
be from the null hypothesis value before we can say it is unusual enough to reject
the null hypothesis.
Our sample mean (330.6) falls within the critical region, which indicates it is
statistically significant at the 0.05 level.
&
School of Law
We can also see if it is statistically significant using the other common significance
level of 0.01.
The two shaded areas each have a probability of 0.005, which adds up to a total
probability of 0.01. This time our sample mean does not fall within the critical
region and we fail to reject the null hypothesis. This comparison shows why you
need to choose your significance level before you begin your study. It protects you
from choosing a significance level because it conveniently gives you significant
results!
Thanks to the graph, we were able to determine that our results are statistically
significant at the 0.05 level without using a P value. However, when you use the
&
School of Law
numeric output produced by statistical software, you’ll need to compare the P

value to your significance level to make this determination.
What Are P values?
P-values are the probability of obtaining an effect at least as extreme as the one in
your sample data, assuming the truth of the null hypothesis.
This definition of P values, while technically correct, is a bit convoluted. It’s easier
to understand with a graph!
To graph the P value for our example data set, we need to determine the distance
between the sample mean and the null hypothesis value (330.6 - 260 = 70.6). Next,
we can graph the probability of obtaining a sample mean that is at least as extreme
in both tails of the distribution (260 +/- 70.6).
&
School of Law
In the graph above, the two shaded areas each have a probability of 0.01556, for a
total probability 0.03112. This probability represents the likelihood of obtaining a
sample mean that is at least as extreme as our sample mean in both tails of the
distribution if the population mean is 260. That’s our P value!
When a P value is less than or equal to the significance level, you reject the null
hypothesis. If we take the P value for our example and compare it to the common
significance levels, it matches the previous graphical results. The P value of
0.03112 is statistically significant at an alpha level of 0.05, but not at the 0.01
level.
&
School of Law
If we stick to a significance level of 0.05, we can conclude that the average energy
cost for the population is greater than 260.
A common mistake is to interpret the P-value as the probability that the null
hypothesis is true. To understand why this interpretation is incorrect, please read
my blog post How to Correctly Interpret P Values.
Discussion about Statistically Significant Results
A hypothesis test evaluates two mutually exclusive statements about a population

to determine which statement is best supported by the sample data. A test result is
statistically significant when the sample statistic is unusual enough relative to the
null hypothesis that we can reject the null hypothesis for the entire population.
“Unusual enough” in a hypothesis test is defined by:
• The assumption that the null hypothesis is true—the graphs are centered on
the null hypothesis value.
• The significance level—how far out do we draw the line for the critical
region?
• Our sample statistic—does it fall in the critical region?
Keep in mind that there is no magic significance level that distinguishes between
the studies that have a true effect and those that don’t with 100% accuracy. The
common alpha values of 0.05 and 0.01 are simply based on tradition. For a
significance level of 0.05, expect to obtain sample means in the critical region 5%
of the time when the null hypothesis is true. In these cases, you won’t know that
the null hypothesis is true but you’ll reject it because the sample mean falls in the
critical region. That’s why the significance level is also referred to as an error rate!
This type of error doesn’t imply that the experimenter did anything wrong or
require any other unusual explanation. The graphs show that when the null
hypothesis is true, it is possible to obtain these unusual sample means for no reason
other than random sampling error. It’s just luck of the draw.
&
School of Law
Significance levels and P values are important tools that help you quantify and
control this type of error in a hypothesis test. Using these tools to decide when to
reject the null hypothesis increases your chance of making the correct decision.
Process of testing in hypothesis testing
We will cover the seven steps one by one.
Step 1: State the Null Hypothesis

The null hypothesis can be thought of as the opposite of the "guess" the research
made (in this example the biologist thinks the plant height will be different for the
fertilizers). So the null would be that there will be no difference among the groups
of plants. Specifically in more statistical language the null for an ANOVA is that
the means are the same. We state the Null hypothesis as:
[Math Processing Error]

for k levels of an experimental treatment.
Note: Why do we do this? Why not simply test the working hypothesis directly?
The answer lies in the Popperian Principle of Falsification. Karl Popper (a
philosopher) discovered that we can’t conclusively confirm a hypothesis, but we
can conclusively negate one. So we set up a Null hypothesis which is effectively
the opposite of the working hypothesis. The hope is that based on the strength of
the data we will be able to negate or Reject the Null hypothesis and accept an
alternative hypothesis. In other words, we usually see the working hypothesis in
HA.
Step 2: State the Alternative Hypothesis
[Math Processing Error]
The reason we state the alternative hypothesis this way is that if the Null is
rejected, there are many possibilities.
For example, [Math Processing Error] is one possibility, as is [Math Processing
Error]. Many people make the mistake of stating the Alternative Hypothesis
as: [Math Processing Error] which says that every mean differs from every other
&
School of Law
mean. This is a possibility, but only one of many possibilities. To cover all
alternative outcomes, we resort to a verbal statement of ‘not all equal’ and then
follow up with mean comparisons to find out where differences among means
exist. In our example, this means that fertilizer 1 may result in plants that are
really tall, but fertilizers 2, 3 and the plants with no fertilizers don't differ from one
another. A simpler way of thinking about this is that at least one mean is different
from all others.
Step 3: Set [Math Processing Error]
If we look at what can happen in a hypothesis test, we can construct the following
contingency table:
In Reality
Decision H0 is TRUE H0 is FALSE
Type II Error
Accept
OK β = probability of Type
H0
II Error
Type I Error
Reject H0 α = probability of Type I OK
Error
You should be familiar with type I and type II errors from your introductory
course. It is important to note that we want to set [Math Processing Error] before
the experiment (a-priori) because the Type I error is the more ‘grevious’ error to
make. The typical value of [Math Processing Error] is 0.05, establishing a 95%
confidence level. For this course we will assume [Math Processing Error] =0.05.
Step 4: Collect Data
Remember the importance of recognizing whether data is collected through an
experimental design or observational.
&
School of Law
Step 5: Calculate a test statistic

For categorical treatment level means, we use an F statistic, named after R.A.
Fisher. We will explore the mechanics of computing the Fstatistic beginning in
Lesson 2. The F value we get from the data is labeled Fcalculated.
Step 6: Construct Acceptance / Rejection regions

As with all other test statistics, a threshold (critical) value of F is established.
This F value can be obtained from statistical tables, and is referred to
as Fcritical or [Math Processing Error]. As a reminder, this critical value is the
minimum value for the test statistic (in this case the F test) for us to be able to
reject the null.
The F distribution, [Math Processing Error], and the location of Acceptance /
Rejection regions are shown in the graph below:
Step 7: Based on steps 5 and 6, draw a conclusion about H0

If the Fcalculated from the data is larger than the Fα, then you are in the Rejection
region and you can reject the Null Hypothesis with (1-α) level of confidence.
Note that modern statistical software condenses step 6 and 7 by providing a p-
value. The p-value here is the probability of getting an Fcalculated even greater than
what you observe. If by chance, the Fcalculated = [Math Processing Error], then
the p-value would exactly equal to α. With larger Fcalculated values, we move further
&
School of Law
into the rejection region and the p-value becomes less than α. So the decision rule
is as follows:
If the p-value obtained from the ANOVA is less than α, then Reject H0 and Accept
HA.
Z-test Vs T-test
Sometimes, measuring every single piece of item is just not practical. That is why
we developed and use statistical methods to solve problems. The most practical
way to do it is to measure just a sample of the population. Some methods test
hypotheses by comparison. The two of the more known statistical hypothesis test
are the T-test and the Z-test. Let us try to breakdown the two.
A T-test is a statistical hypothesis test. In such test, the test statistic follows a
Student’s T-distribution if the null hypothesis is true. The T-statistic was
introduced by W.S. Gossett under the pen name “Student”. The T-test is also
referred as the “Student T-test”. It is very likely that the T-test is most commonly
used Statistical Data Analysis procedure for hypothesis testing since it is
straightforward and easy to use. Additionally, it is flexible and adaptable to a broad
range of circumstances.
There are various T-tests and two most commonly applied tests are the one-sample
and paired-sample T-tests. One-sample T-tests are used to compare a sample mean
with the known population mean. Two-sample T-tests, the other hand, are used to
compare either independent samples or dependent samples.
T-test is best applied, at least in theory, if you have a limited sample size (n < 30)
as long as the variables are approximately normally distributed and the variation of
scores in the two groups is not reliably different. It is also great if you do not know
the populations’ standard deviation. If the standard deviation is known, then, it
&
School of Law
would be best to use another type of statistical test, the Z-test. The Z-test is also
applied to compare sample and population means to know if there’s a significant
difference between them. Z-tests always use normal distribution and also ideally
applied if the standard deviation is known. Z-tests are often applied if the certain
conditions are met; otherwise, other statistical tests like T-tests are applied in
substitute. Z-tests are often applied in large samples (n > 30). When T-test is used
in large samples, the t-test becomes very similar to the Z-test. There are
fluctuations that may occur in T-tests sample variances that do not exist in Z-tests.
Because of this, there are differences in both test results.
Summary:
1. Z-test is a statistical hypothesis test that follows a normal distribution while T-

test follows a Student’s T-distribution.
2. A T-test is appropriate when you are handling small samples (n < 30) while a Z-
test is appropriate when you are handling moderate to large samples (n > 30).
3. T-test is more adaptable than Z-test since Z-test will often require certain
conditions to be reliable. Additionally, T-test has many methods that will suit any
need.
4. T-tests are more commonly used than Z-tests.
5. Z-tests are preferred than T-tests when standard deviations are known.
Chi-square Test
Chi-square test is one of the important nonparametric tests that is used to compare
more than two variables for a randomly selected data. The expected frequencies are
calculated based on the conditions of null hypothesis. The rejection of null
hypothesis is based on the differences of actual value and expected value.
&
School of Law
The data can be examined by using the two types of Chi-square test, which is given
below:
1. Chi-square goodness of fit test
It is used to observe that the closeness of a sample matches a population. The Chi-
square test statistic is,
with k-1 degrees of freedom.

where Oi is the observed count, k is categories, and Ei is the expected counts.
2. Chi-square test for independence of two variables
It is used to check whether the variables are independent of each other or not. The
Chi-square test statistic is,
with degrees of freedom.

where Oi is the observed count, r is number of rows, c is the number of columns,
and Ei is the expected counts.
The F-distribution is formed by the ratio of two independent chi-square variables

divided by their respective degrees of freedom.
Since F is formed by chi-square, many of the chi-square properties carry over to

the F distribution.
• The F-values are all non-negative

• The distribution is non-symmetric
• The mean is approximately 1
• There are two independent degrees of freedom, one for the numerator, and
one for the denominator.
• There are many different F distributions, one for each pair of degrees of
freedom.
&
School of Law
F-Test
The F-test is designed to test if two population variances are equal. It does this by
comparing the ratio of two variances. So, if the variances are equal, the ratio of the
variances will be 1.
All hypothesis testing is done under the assumption the null hypothesis is true
If the null hypothesis is true, then the F test-statistic given above can be
simplified (dramatically). This ratio of sample variances will be test
statistic used. If the null hypothesis is false, then we will reject the null
hypothesis that the ratio was equal to 1 and our assumption that they were equal.
There are several different F-tables. Each one has a different level of significance.
So, find the correct level of significance first, and then look up the numerator
degrees of freedom and the denominator degrees of freedom to find the critical
value.
You will notice that all of the tables only give level of significance for right tail
tests. Because the F distribution is not symmetric, and there are no negative values,
you may not simply take the opposite of the right critical value to find the left
critical value. The way to find a left critical value is to reverse the degrees of
freedom, look up the right critical value, and then take the reciprocal of this value.
For example, the critical value with 0.05 on the left with 12 numerator and 15
denominator degrees of freedom is found of taking the reciprocal of the critical
value with 0.05 on the right with 15 numerator and 12 denominator degrees of
freedom.
Avoiding Left Critical Values
Since the left critical values are a pain to calculate, they are often avoided
altogether. This is the procedure followed in the textbook. You can force the F test
into a right tail test by placing the sample with the large variance in the numerator
&
School of Law
and the smaller variance in the denominator. It does not matter which sample has
the larger sample size, only which sample has the larger variance.
The numerator degrees of freedom will be the degrees of freedom for whichever
sample has the larger variance (since it is in the numerator) and the denominator
degrees of freedom will be the degrees of freedom for whichever sample has the
smaller variance (since it is in the denominator).
If a two-tail test is being conducted, you still have to divide alpha by 2, but you
only look up and compare the right critical value.
Assumptions / Notes
• The larger variance should always be placed in the numerator

• The test statistic is F = s1^2 / s2^2 where s1^2 > s2^2
• Divide alpha by 2 for a two tail test and then find the right critical value
• If standard deviations are given instead of variances, they must be squared
• When the degrees of freedom aren't given in the table, go with the value
with the larger critical value (this happens to be the smaller degrees of
freedom). This is so that you are less likely to reject in error (type I error)
• The populations from which the samples were obtained must be normal.
• The samples must be independent
Brief description of non-parametric tests
Non Parametric Test
Non parametric tests are tests that do not required that the underlying population be
Normal or indeed that they have any single mathematical form and some even
apply to non numerical data. Non-parametric methods are also known as
distribution free methods since they do not have any underlying population.
&
School of Law
Definition
Back to Top
Non parametric tests are defined as the mathematical methods used in statistical
hypothesis testing which, unlike parametric tests, do not make assumptions about
the frequency distribution of variables that are to be assessed. Non parametric test
used when there are skewed data, and it covers techniques that do not rely on data
belonging to any particular distribution.
The word non-parametric does not exactly mean that these models do not have
have any parameters. Actually, the fact is that the nature and number of parameters
is quite flexible and not predefined. Therefore, non-parametric models are known
as distribution-free models.
Uses of Non parametric tests

The situations in which non-parametric tests are used are listed below.
(i) In case of parametric tests are not satisfied.
(ii) When testing hypothesis does not have any distribution.
(iii) In case of requirement of a quick data analysis.
(iv) In case of unscaled data.
&
School of Law
Advantages of Non Parametric Test

(i) Easy to understand.
(ii) No lengthy calculations.
(iii) No requirement of assumption of distribution.
(iv) Applicable to all kinds of data.
Disadvantages of Non Parametric Test

(i) They are less efficient in comparison to parametric tests.
(ii) The results may or may not provide actual answer because they are distribution
free.
Sign Test Statistics
Back to Top
Sign test is conducted under following circumstances.
1. When we have to compare paired data

2. The paired data are obtained from similar conditions
3. No assumptions are made regarding the original population.
Sign Test is merely based on the signs (+ or -) of the deviations x-y and not on
their magnitudes. This test is applicable when ties or zero differences between the
paired observations cannot occur. If tie or zero differences are occurred, then they
must be excluded from the analysis and the number of paired observations counted
is also reduced. This method can be used to analyze individual data also
Sign Test
Let (x1x1,y1y1), (x2x2,y2y2),… (xnxn, ynyn) be the paired observations.

Let didi = xixi - yiyi. di can be positive or negative and all di’s are independent.
We take null hypothesis as H0H0 : p = 1212 and Alternate Hypothesis H1H1:
p≠12≠12.
&
School of Law
Now we count the number of positive signs in di let it be m.
Then, p = mnmn and q = n - (mn)(mn)
Case 1 (n < 30):
If at least one of np or nq < 5,
we can use binomial approximation.
We find P = p(a≤m)p(a≤m) = (12)n∑mx−0(ncx)(12)n∑x−0m(ncx)
where m is the number of positive deviations.
If P > αα we accept the null hypothesis. Other wise we reject it.
Case 2 (n < 30):
If both of np or nq > 5,
we use normal distribution for approximation.
σpσp = pqn−−√pqn.
The limits of acceptance region is given by (p−zpσp,p+zpσp)(p−zpσp,p+zpσp)
Where zp is the value obtained from the standard normal table with αα level of
significance. If αα not given , αα = 0.05.
&
School of Law
If p lies in(p−zpσp,p+zpσp)(p−zpσp,p+zpσp), we accept the hypothesis, otherwise

we reject the hypothesis.
Case 3 (n ≥ 30)
If n ≥ 30, we find Mean = np and standard deviation = npq
Then a = number of negative observations
To find a−meanstandard deviationa−meanstandard deviation
Find the normal table value for the given αα. If |z| ≤≤ table value, we accept the
null hypothesis of sign test, otherwise we reject the hypothesis.
Kruskal-Wallis H-test
Back to Top
Kruskal Wallis H test is used in case of testing two or more populations are
identical. In this test, the null hypothesis is H0:μ1=μ2=γ30:μ1=μ2=γ3 (when there
are three populations). And alternative hypothesis is H1:μ1≠μ2≠γ3H1:μ1≠μ2≠γ3
In Kruskal-Wallis test, we first calculate ranks of the observations items in the
samples and then determine the rank sums for each sample.
H = 12n(n+1)(∑mi−lRiNi)−3(n+1)12n(n+1)(∑i−lmRiNi)−3(n+1)
where,
n is the total number of observations in all samples,
m is the number of samples,
nini represents the number of observations in ithith sample,
&
School of Law
RiRi denotes the rank sum of ithith sample.
Here, we use the x2x2 distribution with m - 1 degrees of freedom and α level of
significance to calculate the critical value. If the calculated value is less than x2x2,
then null hypothesis is accepted, otherwise rejected.
Non Parametric T-Test

Back to Top
Whenever few assumption in the given population are uncertain, we use non-
parametric tests which can be referred as parametric counterparts. When data are
not normally distributed or when they are on an ordinal level of measurement, non-
parametric tests should be used. The basic rule is to use a parametric t test for data
normally distributed and a nonparametric test for skewed data.
Non Parametric Paired t Test

Paired sample t test is used to compare two means scores and these scores come
from the same group. Pair samples t test used when variables are independent and
has two levels and those levels are repeated measures.
Examples
Back to Top
Given below are some of the examples on Non Parametric Test.
Solved Examples
Question 1: Use Kruskal Wallis test to test for differences in mean among 3
samples for α = 0.05
Sample 1 : 100, 65, 102, 86, 80, 89, 98, 96, 91, 101
Sample 2: 84, 103, 126, 62, 92, 97, 95, 90, 94, 76
&
School of Law
Sample 3: 90, 99, 57, 106, 88, 91, 88, 102, 77, 90.
Solution:
The null hypothesis is H0H0: μ1=μ2=μ3μ1=μ2=μ3 and alternative hypothesis is

H1: μ1≠μ2≠μ3μ1≠μ2≠μ3
We first find the rank of the items in the samples (considering whole group as one)
and then find the rank sums of each sample.
Sample1 Rank Sample2 Rank Sample3 Rank
100 24 83 7 90 13
65 3 103 28 99 23
102 26.5 12662 30 57 1
86 8 62 2 106 29
80 6 92 17 88 9.5
89 11 97 21 91 15.5
98 22 95 19 88 9.5
96 20 90 13 102 26.5
91 15.5 94 18 77 5
101 25 76 4 90 13
&
School of Law
R1 161 159 145
Here
nn = 30
nini = 10 for all i
m=3
Degrees of freedom = m - 1 = 3 - 1 = 2.
Test statisic,
H = 12n(n+1)(∑mi−lRiNi)−3(n+1)12n(n+1)(∑i−lmRiNi)−3(n+1)
= 1230(30+1)(161210+159210+145210)−3(30+1)1230(30+1)(161210+159210+14
5210)−3(30+1)
= 0.196
From the x2x2 distribution with m - 1 degrees of freedom and α level of
significance we get critical value = 5.991. Since H < 5.991, we accept the null
hypothesis and we conclude that there is no difference in the mean among 3
samples.
Question 2: The following data show the employee’s rate of defective work before
and after a change in the wage incentive plan. Compare the following two sets of
data to see whether the change has lowered the defective units produced. Use sign
test with αα= 0.01
Before: 9, 8, 7, 10, 8, 11, 9, 7, 6, 9, 11, 9
After: 7, 6, 9, 7, 10, 9, 10, 8, 6, 7, 10, 9.
Solution:
&
School of Law
Null hypothesis as H0: p = 1212

Alternate Hypothesis H1: p < 1212
Here we use one tailed test since we have to check whether the change is lowered.
αα = 0.01.
k = 4, n = 10 (number of signs omitting the zeros)
p = knkn = 410410, q = n - (knkn) = 610610
np = (410410) ×× 10 = 4 < 5
so we use binomial approximation:
P = P(a ≤≤ K) = (12)n∑kx=0(12)n∑x=0k (nCx)
P = P(a ≤≤ 4) = (12)10∑4x=0(12)10∑x=04 (10Cx)
= (12)10(12)10 (10C0 + 10C1 + 10C2 + 10C3 + 10C4)
= (12)10(12)10 (1 + 10 + 45 + 120 + 210)
= 0.000976 ×× 386 = 0.3767
Since P > 0.01, we accept the null hypothesis of the sign test.
&
School of Law
Analysis of Variance: Introduction
Introduction to Analysis of Variance. Analysis of Variance (ANOVA) is a
statistical method used to test differences between two or more means.
... ANOVA is used to test general rather than specific differences among means.
Analysis of variance/Assumptions
< Analysis of variance
Jump to navigationJump to search
Assumptions
ANOVA models are parametric, relying on assumptions about the distribution of
the dependent variables (DVs) for each level of the independent variable(s) (IVs).
Initially the array of assumptions for various types of ANOVA may seem
bewildering. In practice, the first two assumptions here are the main ones to check.
Note that the larger the sample size, the more robust ANOVA is to violation of the
first two assumptions: normality and homoscedasticity (homogeneity of variance).
1. Normality of the DV distribution: The data in each cell should be

approximately normally distributed. Check via
histograms, skewness and kurtosis overall and for each cell (i.e. for each
group for each DV)
2. Homogeneity of variance: The variance in each cell should be similar.
Check via Levene's test or other homogeneity of variance tests which are
generally produced as part of the ANOVA statistical output.
3. Sample size: per cell > 20 is preferred; aids robustness to violation of the
first two assumptions, and a larger sample size increases power
&
School of Law
4. Independent observations: scores on one variable or for one group should

not be dependent on another variable or group (usually guaranteed by the
design of the study)
These assumptions apply to independent sample t-tests (see also t-test
assumptions), one-way ANOVAs and factorial ANOVAs.
For ANOVA models involving repeated measures, there is also the assumptions
of:
1. Sphericity: the difference scores between each within-subject variable have

similar variances
2. Homogeneity of covariance matrices of the depending variables: tests the
null hypothesis that the observed covariance matrices of the dependent
variables are equal across groups (see Box's M)
The one-way ANOVA model and assumptions
A model The mathematical model that describes the

that relationship between the response and treatment for
describes the one-way ANOVA is given by
the Yij=μ+τi+ϵij,
relationship
between the where Yij represents the j-th observation
response (j=1,2,…,ni)on the i-th treatment
and the (i=1,2,…,k levels). So, Y23represents the third
treatment observation using level 2 of the factor. μis the
(between common effect for the whole
the experiment, τi represents the i-th treatment effect,
dependent and ϵij represents the random error present in the j-
and th observation on the i-th treatment.
independent
variables)
&
School of Law
Fixed The errors ϵij are assumed to be normally and

effects independently (NID) distributed, with mean zero
model and variance σ2ϵ. μ is always a fixed parameter,
and τ1,τ2,…,τk are considered to be fixed
parameters if the levels of the treatment are
fixedand not a random sample from a population of
possible levels. It is also assumed that μ is chosen
so that
∑τi=0,i=1,…,k
holds. This is the fixed effects model.
Random If the k levels of treatment are chosen at random,

effects the model equation remains the same. However,
model now the τi values are random variables assumed to
be NID(0, στ) This is the random effects model.
Whether the levels are fixed or random depends on

how these levels are chosen in a given experiment.
Two-Way ANOVA
The two-way analysis of variance is an extension to the one-way analysis of

variance. There are two independent variables (hence the name two-way).
Assumptions
• The populations from which the samples were obtained must be normally or
approximately normally distributed.
• The samples must be independent.
• The variances of the populations must be equal.
&
School of Law
• The groups must have the same sample size.
Hypotheses
There are three sets of hypothesis with the two-way ANOVA.
The null hypotheses for each of the sets are given below.
1. The population means of the first factor are equal. This is like the one-way
ANOVA for the row factor.
2. The population means of the second factor are equal. This is like the one-
way ANOVA for the column factor.
3. There is no interaction between the two factors. This is similar to performing
a test for independence with contingency tables.
Factors
The two independent variables in a two-way ANOVA are called factors. The idea
is that there are two variables, factors, which affect the dependent variable. Each
factor will have two or more levels within it, and the degrees of freedom for each
factor is one less than the number of levels.
Treatment Groups
Treatement Groups are formed by making all possible combinations of the two
factors. For example, if the first factor has 3 levels and the second factor has 2
levels, then there will be 3x2=6 different treatment groups.
As an example, let's assume we're planting corn. The type of seed and type of
fertilizer are the two factors we're considering in this example. This example has
15 treatment groups. There are 3-1=2 degrees of freedom for the type of seed, and
5-1=4 degrees of freedom for the type of fertilizer. There are 2*4 = 8 degrees of
freedom for the interaction between the type of seed and type of fertilizer.
&
School of Law
The data that actually appears in the table are samples. In this case, 2 samples from
each treatment group were taken.
Fert I Fert II Fert III Fert IV Fert V
Seed A-402 106, 110 95, 100 94, 107 103, 104 100, 102
Seed B-894 110, 112 98, 99 100, 101 108, 112 105, 107
Seed C-952 94, 97 86, 87 98, 99 99, 101 94, 98
Main Effect
The main effect involves the independent variables one at a time. The interaction is
ignored for this part. Just the rows or just the columns are used, not mixed. This is
the part which is similar to the one-way analysis of variance. Each of the variances
calculated to analyze the main effects are like the between variances
Interaction Effect
The interaction effect is the effect that one factor has on the other factor. The
degrees of freedom here is the product of the two degrees of freedom for each
factor.
Within Variation
The Within variation is the sum of squares within each treatment group. You have
one less than the sample size (remember all treatment groups must have the same
sample size for a two-way ANOVA) for each treatment group. The total number of
treatment groups is the product of the number of levels for each factor. The within
variance is the within variation divided by its degrees of freedom.
&
School of Law
The within group is also called the error.
F-Tests
There is an F-test for each of the hypotheses, and the F-test is the mean square for
each main effect and the interaction effect divided by the within variance. The
numerator degrees of freedom come from each effect, and the denominator degrees
of freedom is the degrees of freedom for the within variance in each case.
Two-Way ANOVA Table
It is assumed that main effect A has a levels (and A = a-1 df), main effect B has b
levels (and B = b-1 df), n is the sample size of each treatment, and N = abn is the
total sample size. Notice the overall degrees of freedom is once again one less than
the total sample size.
Source SS df MS F
Main Effect A given A, SS / df MS(A) / MS(W)

a-1
Main Effect B given B, SS / df MS(B) / MS(W)

b-1
Interaction given A*B, SS / df MS(A*B) / MS(W)

Effect (a-1)(b-1)
Within given N - ab, SS / df

ab(n-1)
&
School of Law
Total sum of others N - 1,

abn - 1
Statistical Inference
Statistical inference is the process of using data analysis to deduce properties of

an underlying probability distribution.[1] Inferential statistical analysis infers
properties of a population, for example by testing hypotheses and deriving
estimates. It is assumed that the observed data set is sampled from a larger
population.
Inferential statistics can be contrasted with descriptive statistics. Descriptive
statistics is solely concerned with properties of the observed data, and it does not
rest on the assumption that the data come from a larger population.
1 Theory of estimation 1.1 Basic concepts True parameter of the population ϑ
Estimator θb = g(X1, . . . , Xn) Estimated value ϑb = g(x1, . . . , xn) Unbiased
estimator E[θb] = E[g(X1, . . . , Xn)] = ϑ Unbiasedness means, that given a large
number of samples the average over all estimations lies near the true parameter.
Mean squared error (MSE) MSE(θb) = E[(θb− ϑ) 2 ] = E[(θb− E[θb])2 ] | {z }
=Var(θb) + (E[θb] − ϑ) 2 | {z } Bias2 Efficient estimator Assume unbiased
estimators θb1 and θb2. • Estimator θb1 is relatively efficient compared to θb2, if
Var(θb1) ≤ Var(θb2). • Estimator θb1 is absolutely efficient for parameter ϑ, if it
has the smallest variance among all unbiased estimators of ϑ. 1.2 Estimation
&
School of Law
methods Maximum likelihood method Likelihood function L(ϑ) = L(ϑ|x1, . . . , xn)
= Qn i=1 f(xi |ϑ) → maximize Log-likelihood function log{L(ϑ)} = Pn i=1 log{f(xi
|ϑ)} → maximize Least squares method Quadratic form Q(ϑ) = Pn i=1 (xi − E[Xi
])2 = Pn i=1 (xi − gi(ϑ))2 → minimize
Point estimation (Properties of good estimators)
Introduction to Estimation
To estimate means to esteem (to give value to). An estimator is any quantity
calculated from the sample data which is used to give information about an
unknown quantity in the population. For example, the sample mean is an estimator
of the population mean m.
Results of estimation can be expressed as a single value; known as a point

estimate, or a range of values, referred to as a confidence interval. Whenever we
use point estimation, we calculate the margin of error associated with that point
estimation.
Estimators of population parameters are sometimes distinguished from the true

value by using the symbol 'hat'. For example, true population standard deviation s
is estimated from a sample population standard deviation.
Again, the usual estimator of the population mean is = Sxi / n, where n is the size
of the sample and x1, x2, x3,.......,xn are the values of the sample. If the value of the
estimator in a particular sample is found to be 5, then 5 is the estimate of the
population mean µ.
&
School of Law
Qualities of a Good Estimator
A "Good" estimator is the one which provides an estimate with the following
qualities:
Unbiasedness: An estimate is said to be an unbiased estimate of a given parameter

when the expected value of that estimator can be shown to be equal to the
parameter being estimated. For example, the mean of a sample is an unbiased
estimate of the mean of the population from which the sample was drawn.
Unbiasedness is a good quality for an estimate, since, in such a case, using
weighted average of several estimates provides a better estimate than each one of
those estimates. Therefore, unbiasedness allows us to upgrade our estimates. For
example, if your estimates of the population mean µ are say, 10, and 11.2 from two
independent samples of sizes 20, and 30 respectively, then a better estimate of the
population mean µ based on both samples is [20 (10) + 30 (11.2)] (20 + 30) =
10.75.
Consistency: The standard deviation of an estimate is called the standard error of

that estimate. The larger the standard error the more error in your estimate. The
standard deviation of an estimate is a commonly used index of the error entailed in
estimating a population parameter based on the information in a random sample of
size n from the entire population.
An estimator is said to be "consistent" if increasing the sample size produces an

estimate with smaller standard error. Therefore, your estimate is "consistent" with
the sample size. That is, spending more money to obtain a larger sample produces
a better estimate.
Efficiency: An efficient estimate is one which has the smallest standard error
among all unbiased estimators.
&
School of Law
The "best" estimator is the one which is the closest to the population parameter
being estimated.
The Concept of Distance for an Estimator
The above figure illustrates the concept of closeness by means of aiming at the
center for unbiased with minimum variance. Each dart board has several samples:
The first one has all its shots clustered tightly together, but none of them hit the
center. The second one has a large spread, but around the center. The third one is
worse than the first two. Only the last one has a tight cluster around the center,
therefore has good efficiency.
If an estimator is unbiased, then its variability will determine its reliability. If an

estimator is extremely variable, then the estimates it produces may not on average
be as close to the population parameter as a biased estimator with small variance.
The following chart depicts the quality of a few popular estimators for the
population mean µ:
The widely used estimator of the population mean µ is = Sxi/n, where n is the
size of the sample and x1, x2, x3,......., xn are the values of the sample that have all
of the above good properties. Therefore, it is a "good" estimator.
If you want an estimate of central tendency as a parameter for a test or for

comparison, then small sample sizes are unlikely to yield any stable estimate. The
mean is sensible in a symmetrical distribution as a measure of central tendency;
but, e.g., with ten cases, you will not be able to judge whether you have a
symmetrical distribution. However, the mean estimate is useful if you are trying to
estimate the population sum, or some other function of the expected value of the
distribution. Would the median be a better measure? In some distributions (e.g.,
shirt size) the mode may be better. BoxPlot will indicate outliers in the data set. If
there are outliers, the median is better than the mean as a measure of central
tendency.
&
School of Law
You might like to use Descriptive Statistics Applet for obtaining "good" estimates.
Statistics with Confidence
In practice, a confidence interval is used to express the uncertainty in a quantity

being estimated. There is uncertainty because inferences are based on a random
sample of finite size from the entire population or process of interest. To judge the
statistical procedure we can ask what would happen if we were to repeat the same
study, over and over, getting different data (and thus different confidence intervals)
each time.
In most studies, investigators are usually interested in determining the size of

difference of a measured outcome between groups, rather than a simple indication
of whether or not it is statistically significant. Confidence intervals present a range
of values, on the basis of the sample data, in which the value of such a difference
may lie.
Know that a confidence interval computed from one sample will be different from
a confidence interval computed from another sample.
Understand the relationship between sample size and width of confidence interval,
moreover, know that sometimes the computed confidence interval does not contain
the true value.
Let's say you compute a 95% confidence interval for a mean m . The way to
interpret this is to imagine an infinite number of samples from the same
population, 95% of the computed intervals will contain the population mean m ,
and at most 5% will not. However, it is wrong to state, "I am 95% confident that
the population mean m falls within the interval."
Again, the usual definition of a 95% confidence interval is an interval constructed

by a process such that the interval will contain the true value 95% of the time. This
means that "95%" is a property of the process, not the interval.
&
School of Law
Is the probability of occurrence of the population mean greater in the confidence

interval (CI) center and lowest at the boundaries? Does the probability of
occurrence of the population mean in a confidence interval vary in a measurable
way from the center to the boundaries? In a general sense, normality condition is
assumed, and then the interval between CI limits is represented by a bell shaped t
distribution. The expectation (E) of another value is highest at the calculated mean
value, and decreases as the values approach the CI limits.
Tolerance Interval and CI: A good approximation for the single measurement
tolerance interval is n½ times confidence interval of the mean.
Statistics with Confidence
You need to use Sample Size Determination JavaScript at the design stage of your
statistical investigation in decision making with specific subjective requirements.
A Note on Multiple Comparison via Individual Intervals: Notice that, if the

confidence intervals from two samples do not overlap, there is a statistically
significant difference, say at 5%. However, the other way is not true; two
confidence intervals can overlap even when there is a significant difference
between them.
As a numerical example, consider the means of two independent samples. Suppose

their values are 10 and 22 with equal standard error of 4. The 95% confidence
interval for the two statistics (using the critical value of 1.96) are: [2.2, 17.8] and
[14.2, 29.8], respectively. As you see they display considerable overlap. However,
the z-statistic for the two-population mean is: |22 -10|/(16 + 16)½ = 2.12 which is
clearly significant under the same conditions as applied for constructing the
confidence intervals.
One should examine the confidence interval for the difference explicitly. Even if
the confidence intervals are overlapping, it is hard to find the exact overall
&
School of Law
confidence level. However, the sum of individual confidence levels can serve as an
upper limit. This is evident from the fact that: P(A and B) £ P(A) + P(B).
The Confidence Interval JavaScript demonstrates the precision vs confidence.
Interval estimation
In statistics, interval estimation is the use of sample data to calculate

an interval of possible (or probable) values of an unknown population parameter,
in contrast to point estimation, which is a single number.
Test of hypothesis
A statistical hypothesis, sometimes called confirmatory data analysis, is

a hypothesis that is testable on the basis of observing a process that is modeled via
a set of random variables.[1] A statistical hypothesis test is a method of statistical
inference. Commonly, two statistical data sets are compared, or a data set obtained
by sampling is compared against a synthetic data set from an idealized model. A
hypothesis is proposed for the statistical relationship between the two data sets, and
this is compared as an alternative to an idealized null hypothesis that proposes no
relationship between two data sets. The comparison is deemed statistically
significant if the relationship between the data sets would be an unlikely realization
of the null hypothesis according to a threshold probability—the significance level.
Hypothesis tests are used in determining what outcomes of a study would lead to a
rejection of the null hypothesis for a pre-specified level of significance. The
process of distinguishing between the null hypothesis and the alternative
hypothesis is aided by identifying two conceptual types of errors, type 1 and type
2, and by specifying parametric limits on e.g. how much type 1 error will be
permitted.
An alternative framework for statistical hypothesis testing is to specify a set
of statistical models, one for each candidate hypothesis, and then use model
selection techniques to choose the most appropriate model.[2] The most common
&
School of Law
selection techniques are based on either Akaike information criterion or Bayes

factor.
Confirmatory data analysis can be contrasted with exploratory data analysis, which
may not have pre-specified hypotheses.
The testing process[edit]

In the statistics literature, statistical hypothesis testing plays a fundamental
role.[5] The usual line of reasoning is as follows:
1. There is an initial research hypothesis of which the truth is unknown.

2. The first step is to state the relevant null and alternative hypotheses. This is
important, as mis-stating the hypotheses will muddy the rest of the process.
3. The second step is to consider the statistical assumptions being made about
the sample in doing the test; for example, assumptions about the statistical
independence or about the form of the distributions of the observations. This
is equally important as invalid assumptions will mean that the results of the
test are invalid.
4. Decide which test is appropriate, and state the relevant test statistic T.
5. Derive the distribution of the test statistic under the null hypothesis from the
assumptions. In standard cases this will be a well-known result. For
example, the test statistic might follow a Student's t distribution or a normal
distribution.
6. Select a significance level (α), a probability threshold below which the null
hypothesis will be rejected. Common values are 5% and 1%.
7. The distribution of the test statistic under the null hypothesis partitions the
possible values of T into those for which the null hypothesis is rejected—the
so-called critical region—and those for which it is not. The probability of
the critical region is α.
8. Compute from the observations the observed value tobs of the test statistic T.
9. Decide to either reject the null hypothesis in favor of the alternative or not
reject it. The decision rule is to reject the null hypothesis H0 if the observed
&
School of Law
value tobs is in the critical region, and to accept or "fail to reject" the
hypothesis otherwise.
An alternative process is commonly used:
1. Compute from the observations the observed value tobs of the test statistic T.
2. Calculate the p-value. This is the probability, under the null hypothesis, of
sampling a test statistic at least as extreme as that which was observed.
3. Reject the null hypothesis, in favor of the alternative hypothesis, if and only
if the p-value is less than the significance level (the selected probability)
threshold.
The two processes are equivalent.[6] The former process was advantageous in the
past when only tables of test statistics at common probability thresholds were
available. It allowed a decision to be made without the calculation of a probability.
It was adequate for classwork and for operational use, but it was deficient for
reporting results.
The latter process relied on extensive tables or on computational support not
always available. The explicit calculation of a probability is useful for reporting.
The calculations are now trivially performed with appropriate software.
Test of hypothesis concerning Mean
Assume that a biological population is sampled and you wish to estimate the mean
value of some variable within that population. In chapter 3, we saw that the Central
Limit Theorem indicates that, when the population distribution is normal, the
sampling distribution of the mean also will be normal. In addition, we saw that,
when using the sample standard deviation, s, to estimate σσ, the tdistribution can
be used to represent the sampling distribution of the mean. Thus, the t distribution
can be used to test hypotheses about the population mean, μμ. This is referred to as
the "one sample t test."
&
School of Law
The t test evaluates the hypothesis that the parametric mean, μμ, is equal to a
particular value. That is, it tests $H_{o}: \mu = \mu {o},where,where\mu
_{o}isthespecificvalueofinterest.Ifisthespecificvalueofinterest.IfH{o}$ is true, then
the value,
t=y¯−μosn√,t=y¯−μosn,
will follow a t distribution with v=n−1v=n−1 degrees of freedom. If μo=0μo=0,

then the formula simplifies to
t=y¯sn√,t=y¯sn,
which is really just a re-arrangement of the terms in equation 3.1. Because we

know that, theoretically, this test statistics will follow a t distribution, we have a
way of calculating a p-value that can be used to evaluate HoHo within the context
of the approach laid out by Neyman and Pearson.
Going back to the example in chapter 3, a random sample of 20 items was taken
from a population in which we knew that the variable of interest, yy, followed a
normal distribution. In the sample, y¯=8.48y¯=8.48, and s=5s=5. Now, let’s
assume that you wish to test Ho:μ=6Ho:μ=6 vs. Ha:μ≠6Ha:μ≠6. The value of our
observed test statistic can be calculated.
We then compare this observed value to the theoretical sampling distribution we

would expect, assuming that HoHo is true. Specifically, we will calculate the two-
tailed p-value.
The approach is illustrated in Fig.6.2. Keep in mind that this p-
value, P(|t|≥tobs)P(|t|≥tobs), is valid for Ha:μ≠6Ha:μ≠6. Had the alternative
hypothesis been Ha:μ<6Ha:μ<6, the appropriate p-value would have
been P(t≤tobs)P(t≤tobs). If the alternative hypothesis had been Ha:μ>6Ha:μ>6, the
appropriate p-value would have been P(t≥tobs)P(t≥tobs).
Under a strict Neyman-Pearson interpretation, because p=p=0.039 is less
than α=0.05α=0.05, we would reject HoHo in favor of HaHa and conclude
that μ≠6μ≠6.
Hypothesis Testing for a Proportion
Printer-friendly version
&
School of Law
Ultimately we will measure statistics (e.g. sample proportions and sample means)
and use them to draw conclusions about unknown parameters (e.g. population
proportion and population mean). This process, using statistics to make judgments
or decisions regarding population parameters is called statistical inference.
Example 2 above produced a sample proportion of 47% heads and is written:
^pp^ [read p-hat] = 47/100 = 0.47

P-hat is called the sample proportion and remember it is a statistic (soon we will
look at sample means, ¯xx¯.) But how can p-hat be an accurate measure of p, the
population parameter, when another sample of 100 coin flips could produce 53
heads? And for that matter we only did 100 coin flips out of an uncountable
possible total!
The fact that these samples will vary in repeated random sampling taken at the
same time is referred to as sampling variability. The reason sampling variability is
acceptable is that if we took many samples of 100 coin flips an calculated the
proportion of heads in each sample then constructed a histogram or boxplot of the
sample proportions, the resulting shape would look normal (i.e. bell-shaped) with a
mean of 50%.
[The reason we selected a simple coin flip as an example is that the concepts just
discussed can be difficult to grasp, especially since earlier we mentioned that rarely
is the population parameter value known. But most people accept that a coin will
produce an equal number of heads as tails when flipped many times.]
A statistical hypothesis test is a procedure for deciding between two possible
statements about a population. The phrase significance testmeans the same thing
as the phrase "hypothesis test."
The two competing statements about a population are called the null hypothesis
and the alternative hypothesis.
▪ A typical null hypothesis is a statement that two variables are not related. Other
examples are statements that there is no difference between two groups (or
treatments) or that there is no difference from an existing standard value.
&
School of Law
▪ An alternative hypothesis is a statement that there is a relationship between two

variables or there is a difference between two groups or there is a difference from a
previous or existing standard.
NOTATION: The notation Ho represents a null hypothesis and Ha represents an
alternative hypothesis and po is read as p-not or p-zero and represents the null
hypothesized value. Shortly, we will substitute μo for when discussing a test of
means.
Ho: p = po
Ha: p ≠ po or Ha: p > po or Ha: p < po [Remember, only select one Ha]
The first Ha is called a two-sided test since "not equal" implies that the true value
could be either greater than or less than the test value, po. The other two Ha are
referred to as one-sided tests since they are restricting the conclusion to a specific
side of po.
Example 3 – This is a test of a proportion:
A Tufts University study finds that 40% of 12th grade females feel they are
overweight. Is this percent lower for college age females? Let p = proportion of
college age females who feel they are overweight. Competing hypothesis are:
Ho: p = .40 (or greater) That is, no difference from Tufts study finding.
Ha: p < .40 (proportion feeling they are overweight is less for college age females.
Example 4 – This is a test of a mean:
Is there a difference between the mean amount that men and women study per
week? Competing hypotheses are:
Null hypothesis: There is no difference between mean weekly hours of study for
men and women, writing in statistical language as μ1 = μ2
Alternative hypothesis: There is a difference between mean weekly hours of study
for men and women, writing in statistical language as μ1≠ μ2
This notation is used since the study would consider two independent samples: one
from Women and another from Men.
Test Statistic and p-value
▪ A test statistic is a summary of a sample that is in some way sensitive to
differences between the null and alternative hypothesis.
&
School of Law
▪ A p-value is the probability that the test statistic would "lean" as much (or more)
toward the alternative hypothesis as it does if the real truth is the null hypothesis.
That is, the p-value is the probability that the sample statistic would occur under
the presumption that the null hypothesis is true.
A small p-value favors the alternative hypothesis. A small p-value means the
observed data would not be very likely to occur if we believe the null hypothesis is
true. So we believe in our data and disbelieve the null hypothesis. An easy
(hopefully!) way to grasp this is to consider the situation where a professor states
that you are just a 70% student. You doubt this statement and want to show that
you are better that a 70% student. If you took a random sample of 10 of your
previous exams and calculated the mean percentage of these 10 tests, which mean
would be less likely to occur if in fact you were a 70% student (the null
hypothesis): a sample mean of 72% or one of 90%? Obviously the 90% would be
less likely and therefore would have a small probability (i.e. p-value).
Using the p-value to Decide between the Hypotheses
▪ The significance level of a test is the border used for deciding between the null and
alternative hypotheses.
▪ Decision Rule: We decide in favor of the alternative hypothesis when a p-value is
less than or equal to the significance level. The most commonly used significance
level is 0.05.
In general, the smaller the p-value the stronger the evidence is in favor of the
alternative hypothesis.
Example 3 Continued:
In a recent elementary statistics survey, the sample proportion (of women) saying
they felt overweight was 37 /129 = .287. Note that this leans toward the alternative
hypothesis that the "true" proportion is less than .40. [Recall that the Tufts
University study finds that 40% of 12th grade females feel they are overweight. Is
this percent lower for college age females?]
Step 1: Let p = proportion of college age females who feel they are overweight.
Ho: p = .40 (or greater) That is, no difference from Tufts study finding.
Ha: p < .40 (proportion feeling they are overweight is less for college age females.
Step 2:
&
School of Law
If npo ≥ 10 and n(1 – po) ≥ 10 then we can use the following Z-test statistic: Since
both (129) × (0.4) and (129) × (0.6) > 10 [or consider that the number of successes
and failures, 37 and 92 respectively, are at least 10] we calculate the test statistic
by:
z=^p−p0√p0(1−p0)nz=p^−p0p0(1−p0)n
Note: In computing the Z-test statistic for a proportion we use the hypothesized
value po here not the sample proportion p-hat in calculating the standard error! We
do this because we "believe" the null hypothesis to be true until evidence says
otherwise.
z=0.287−0.40√ 0.40(1−0.40)129 =−2.62z=0.287−0.400.40(1−0.40)129=−2.62
Step 3: The p-value can be found from Standard Normal Table
Calculating p-value:
The method for finding the p-value is based on the alternative hypothesis:
2 × P(Z ≥ | z | ) for Ha : p ≠ po where |z| is the absolute value of z
P(Z ≥ z ) for Ha : p > po
P(Z ≤ z) for Ha : p < po
In our example we are using Ha : p < .40 so our p-value will be found from P(Z ≤
z) = P(Z ≤ -2.62) and from Standard Normal Table this is equal to 0.0044.
Step 4: We compare the p-value to alpha, which we will let alpha be 0.05. Since
0.0044 is less than 0.05 we will reject the null hypothesis and decide in favor of the
alternative, Ha.
Step 5: We’d conclude that the percentage of college age females who felt they
were overweight is less than 40%. [Note: we are assuming that our sample, since
not random, is representative of all college age females.]
The p-value= .004 indicates that we should decide in favor of the alternative
hypothesis. Thus we decide that less than 40% of college women think they are
overweight.
The "Z-value" (-2.62) is the test statistic. It is a standardized score for the
difference between the sample p and the null hypothesis value p = .40. The p-
value is the probability that the z-score would lean toward the alternative
hypothesis as much as it does if the true population really was p = .40.
&
School of Law
Hypothesis Tests for One or Two Variances or Standard Deviations
Chi-Square-tests and F-tests for variance or standard deviation both require that the
original population be normally distributed.
Testing a Claim about a Variance or Standard Deviation
To test a claim about the value of the variance or the standard deviation of a
population, then the test statistic will follow a chi-square distribution
with n−1n−1 dgrees of freedom, and is given by the following formula.
χ2=(n−1)s2σ20χ2=(n−1)s2σ02
The television habits of 30 children were observed. The sample mean was found to
be 48.2 hours per week, with a standard deviation of 12.4 hours per week. Test the
claim that the standard deviation was at least 16 hours per week.
• The hypotheses are:

H0:σ=16H0:σ=16
Ha:σ<16Ha:σ<16
• We shall choose α=0.05α=0.05.
• The test statistic is
χ2=(n−1)s2σ20=(30−1)12.42162=17.418χ2=(n−1)s2σ02=(30−1)12.42162=
17.418.
• The p-value is p=χ2cdf(0,17.418,29)=0.0447p=χ2cdf(0,17.418,29)=0.0447.
• Since p<αp<α, we reject H0H0.
• The variation in television watching was less than 16 hours per week.
Testing a the Difference of Two Variances or Two Standard Deviations
Two equal variances would satisfy the equation σ21=σ22σ12=σ22, which is

equivalent to σ21σ22=1σ12σ22=1. Since sample variances are related to chi-
square distributions, and the ratio of chi-square distributions is an F-distribution,
&
School of Law
we can use the F-distribution to test against a null hypothesis of equal variances.
Note that this approach does not allow us to test for a particular magnitude of
difference between variances or standard deviations.
Given sample sizes of n1n1 and n2n2, the test statistic will have n1−1n1−1 and
n2−1n2−1 degrees of freedom, and is given by the following formula.
F=s21s22F=s12s22
If the larger variance (or standard deviation) is present in the first sample, then the
test is right-tailed. Otherwise, the test is left-tailed. Most tables of the F-
distribution assume right-tailed tests, but that requirement may not be necessary
when using technology.
Samples from two makers of ball bearings are collected, and their diameters (in
inches) are measured, with the following results:
• Acme: n1=80n1=80, s1=0.0395s1=0.0395

• Bigelow: n2=120n2=120, s2=0.0428s2=0.0428
Assuming that the diameters of the bearings from both companies are normally
distributed, test the claim that there is no difference in the variation of the
diameters between the two companies.
• The hypotheses are:

H0:σ1=σ2H0:σ1=σ2
Ha:σ1≠σ2Ha:σ1≠σ2
• We shall choose α=0.05α=0.05.
• The test statistic is
F=s21s22=0.039520.04282=0.8517F=s12s22=0.039520.04282=0.8517.
• Since the first sample had the smaller standard deviation, this is a left-tailed
test. The p-value is
p=Fcdf(0,0.8517,79,119)=0.2232p=Fcdf⁡(0,0.8517,79,119)=0.2232.
&
School of Law
• Since p>αp>α, we fail to reject H0H0.

• There is insufficient evidence to conclude that the diameters of the ball
bearings in the two companies have different standard deviations.
If the two samples had been reversed in our computations, we would have obtained
the test statistic F=1.1741F=1.1741, and performing a right-tailed test, found the
p-value p=Fcdf(1.1741,∞,119,79)=0.2232p=Fcdf⁡(1.1741,∞,119,79)=0.2232.
Of course, the answer is the same.
UNIT-4
Correlation Analysis:
Correlation is a statistical tool that helps to measure and analyze the degree of
relationship between two variables. „ Correlation analysis deals with the
association between two or more variables.
Types of Correlation Type I :-1) Positive Correlation. 2)Negative Correlation
Positive Correlation: The correlation is said to be positive correlation if the
values of two variables changing with same direction. Ex. Pub. Exp. & sales,
Height & weight.
Negative Correlation: The correlation is said to be negative correlation when the
values of variables change with opposite direction. Ex. Price & qty. demanded.
Direction of the Correlation:-
&
School of Law
Positive relationship – Variables change in the same direction. „ As X is

increasing, Y is increasing „ As X is decreasing, Y is decreasing „ E.g., As height
increases, so does weight. „
Negative relationship – Variables change in opposite directions. „ As X is
increasing, Y is decreasing „ As X is decreasing, Y is increasing „ E.g., As TV
time increases, grades decrease
Types of Correlation Type II:- 1) Correlation Simple. 2)Multiple Correlation

3)Partial Correlation. 4)Total Correlation
Simple correlation: -Under simple correlation problem there are only two
variables are studied. „
Multiple Correlation:- Under Multiple Correlation three or more than three
variables are studied. Ex. Q d = f ( P,P C, P S, t, y ) „
Partial correlation:- analysis recognizes more than two variables but considers
only two variables keeping the other constant. „
Total correlation:- is based on all the relevant variables, which is normally not
feasible.
Types of Correlation Type III:-
„ Linear correlation:- Correlation is said to be linear when the amount of change
in one variable tends to bear a constant ratio to the amount of change in the other.
The graph of the variables having a linear relationship will form a straight line. Ex
X = 1, 2, 3, 4, 5, 6, 7, 8,Y = 5, 7, 9, 11, 13, 15, 17, 19, Y = 3 + 2x „
Non Linear correlation:- The correlation would be non linear if the amount of
change in one variable does not bear a constant ratio to the amount of change in the
other variable.
Methods of Studying Correlation:-

2) Karl Pearson’s Coefficient of Correlation „
&
School of Law
3) Method of Least Squares
Karl Pearson's Coefficient of Correlation:-

• arson’s ‘r’ is the most common correlation coefficient.
• Karl Pearson’s Coefficient of Correlation denoted by- ‘r’ The coefficient of
correlation ‘r’ measure the degree of linear relationship between two
variables say x & y
• Karl Pearson’s Coefficient of Correlation denoted by- r -1 ≤ r ≥ +1 „
• Degree of Correlation is expressed by a value of Coefficient „ Direction of
change is Indicated by sign ( - ve) or ( + ve)
• When deviation taken from actual mean: r(x, y)= Σxy / √ Σx² Σy²
• When deviation taken from an assumed mean: r = N Σdxdy - Σdx Σdy/ √N

Σdx²-( Σdx)² √N Σdy²-( Σdy)²
Properties of Correlation coefficient:-

• The correlation coefficient lies between -1 & +1 symbolically ( - 1 ≤ r ≥ 1 )
„
• The correlation coefficient is independent of the change of origin & scale. „
• The coefficient of correlation is the geometric mean of two regression
coefficient.
r = √ bxy * byx
• The one regression coefficient is (+ve) other regression coefficient is also
(+ve) correlation coefficient is (+ve)
&
School of Law
&
School of Law
Spearman’s Rank Coefficient of Correlation :-

• „ When statistical series in which the variables under study are not capable
of quantitative measurement but can be arranged in serial order, in such
situation pearson’s correlation coefficient can not be used in such case
Spearman Rank correlation can be used.
• R = 1- (6 ∑ D2 ) / N (N 2 – 1) „
• R = Rank correlation coefficient „
• D = Difference of rank between paired item in two series. „
• N = Total number of observation.
• Equal Ranks or tie in Ranks: In such cases average ranks should be assigned
to each individual.
• R = 1- (6 ∑ D2 ) + AF / N (N 2 – 1) AF = 1/12(m 1 3 – m 1) + 1/12(m 2 3 –
m 2) +…. 1/12(m 2 3 – m 2 )
m = The number of time an item is repeated
Example:-The scores for nine students in physics and math are as follows:
Physics: 35, 23, 47, 17, 10, 43, 9, 6, 28
Mathematics: 30, 33, 45, 23, 8, 49, 12, 4, 31
Compute the student’s ranks in the two subjects and compute the Spearman rank
correlation.
Step 1: Find the ranks for each individual subject. I used the Excel rank function to
find the ranks. If you want to rank by hand, order the scores from greatest to
smallest; assign the rank 1 to the highest score, 2 to the next highest and so on:
&
School of Law
Step 2: Add a third column, d, to your data. The d is the difference between ranks.
For example, the first student’s physics rank is 3 and math rank is 5, so the
difference is 3 points. In a fourth column, square your d values.
Step 3: Sum (add up) all of your d-squared values.

4 + 4 + 1 + 0 + 1 + 1 + 1 + 0 + 0 = 12. You’ll need this for the formula (the Σ d 2 is
just “the sum of d-squared values”).
Step 4: Insert the values into the formula. These ranks are not tied, so use the first
formula:
= 1 – (6*12)/(9(81-1))
= 1 – 72/720
= 1-0.1
= 0.9
The Spearman Rank Correlation for this set of data is 0.9.
Regression Analysis
In statistical modeling, regression analysis is a set of statistical processes
for estimating the relationships among variables. It includes many techniques for
modeling and analyzing several variables, when the focus is on the relationship
between a dependent variable and one or more independent variables (or
'predictors'). More specifically, regression analysis helps one understand how the
&
School of Law
typical value of the dependent variable (or 'criterion variable') changes when any
one of the independent variables is varied, while the other independent variables
are held fixed.
Most commonly, regression analysis estimates the conditional expectation of the
dependent variable given the independent variables – that is, the average value of
the dependent variable when the independent variables are fixed. Less commonly,
the focus is on a quantile, or other location parameter of the conditional
distribution of the dependent variable given the independent variables. In all cases,
a function of the independent variables called the regression function is to be
estimated. In regression analysis, it is also of interest to characterize the variation
of the dependent variable around the prediction of the regression function using
a probability distribution. A related but distinct approach is Necessary Condition
Analysis[1] (NCA), which estimates the maximum (rather than average) value of
the dependent variable for a given value of the independent variable (ceiling line
rather than central line) in order to identify what value of the independent variable
is necessary but not sufficient for a given value of the dependent variable.
Regression analysis is widely used for prediction and forecasting, where its use has
substantial overlap with the field of machine learning. Regression analysis is also
used to understand which among the independent variables are related to the
dependent variable, and to explore the forms of these relationships. In restricted
circumstances, regression analysis can be used to infer causal
relationships between the independent and dependent variables. However this can
lead to illusions or false relationships, so caution is advisable;[2] for
example, correlation does not prove causation.
Many techniques for carrying out regression analysis have been developed.
Familiar methods such as linear regression and ordinary least squares regression
are parametric, in that the regression function is defined in terms of a finite number
of unknown parametersthat are estimated from the data. Nonparametric
regression refers to techniques that allow the regression function to lie in a
specified set of functions, which may be infinite-dimensional.
The performance of regression analysis methods in practice depends on the form of
the data generating process, and how it relates to the regression approach being
&
School of Law
used. Since the true form of the data-generating process is generally not known,
regression analysis often depends to some extent on making assumptions about this
process. These assumptions are sometimes testable if a sufficient quantity of data is
available. Regression models for prediction are often useful even when the
assumptions are moderately violated, although they may not perform optimally.
However, in many applications, especially with small effects or questions
of causality based on observational data, regression methods can give misleading
results.[3][4]
In a narrower sense, regression may refer specifically to the estimation of
continuous response (dependent) variables, as opposed to the discrete response
variables used in classification.[5] The case of a continuous dependent variable may
be more specifically referred to as metric regression to distinguish it from related
problems.[6]
Definition: The Regression Line is the line that best fits the data, such that the
overall distance from the line to the points (variable values) plotted on a graph is
the smallest. In other words, a line used to minimize the squared deviations of
predictions is called as the regression line.
There are as many numbers of regression lines as variables. Suppose we take two
variables, say X and Y, then there will be two regression lines:
▪ Regression line of Y on X: This gives the most probable values of Y from the
given values of X.
▪ Regression line of X on Y: This gives the most probable values of X from the
given values of Y.
The algebraic expression of these regression lines is called as Regression
Equations. There will be two regression equations for the two regression lines.
&
School of Law
The correlation between the variables depend on the distance between these two
regression lines, such as the nearer the regression lines to each other the higher is
the degree of correlation, and the farther the regression lines to each other the
lesser is the degree of correlation.
The correlation is said to be either perfect positive or perfect negative when the
two regression lines coincide, i.e. only one line exists. In case, the variables are
independent; then the correlation will be zero, and the lines of regression will be at
right angles, i.e. parallel to the X axis and Y axis.
Note: The regression lines cut each other at the point of average of X and Y. This
means, from the point where the lines intersect each other the perpendicular is
drawn on the X axis we will get the mean value of X. Similarly, if the horizontal
line is drawn on the Y axis we will get the mean value of Y.
Question: Find the equation of the two lines of regression and hence find
correlation coefficient from the following data.
0
Let a = 68, b = 69, c = 1

Here n = 8
&
School of Law
X¯¯¯¯=a+cx¯¯¯=a+c. ∑xn=68+1.08=68X¯=a+cx¯=a+c. ∑xn=68+1.08=68

Y¯¯¯¯=b+cy¯¯¯=a+c. ∑yn=69+1.−28=68.75Y¯=b+cy¯=a+c. ∑yn=69+1.−28=68.
75
bxy= n∑xy− ∑x∑yn∑y2−(∑y)2=8(26)−(0)(−2)8(52)−(−2)2=0.5049bxy= n∑xy−
∑x∑yn∑y2−(∑y)2=8(26)−(0)(−2)8(52)−(−2)2=0.5049
byx= n∑xy− ∑x∑yn∑x2−(∑x)2=8(26)−(0)(−2)8(36)−(0)2=0.7222byx= n∑xy− ∑x
∑yn∑x2−(∑x)2=8(26)−(0)(−2)8(36)−(0)2=0.7222
∴∴ Regression equation of Y on X is
Y=Y¯¯¯¯=byx(X− X¯¯¯¯)Y=Y¯=byx(X− X¯)
∴∴ Y - 68.75 = 0.7222(X-68)
∴∴ Y = 0.7222 X + 19.6389
∴∴ Regression equation of X on Y is Y = X¯¯¯¯=bxy(Y− Y¯¯¯¯)X¯=bxy(Y− Y¯)
∴∴ X - 68 = 0.5049(Y - 68.75)
∴∴ X = 0.5409 Y + 33.2913
Now , r = ±± byx∗bxy−−−−−−−√byx∗bxy
= ±± 0.5049∗0.7222−−−−−−−−−−−−√0.5049∗0.7222
= ±± 0.6039
Since , byx and bxy are both positive , r is positive
∴∴ r = 0.6039
&
School of Law
Ans : Lines of regression are Y = 0.7222 X + 19.6389 and X = 0.5049 Y + 33.2913

coefficient of correlation (r) = 0.6039
Bivariate Frequency Distribution
Univariate distribution is also known as one variable frequency distribution (As

“uni” means “one”).
It is the simplest form of representing data. It doesn’t deal with relationships of
variable. It’s one of the most important requirement is for taking data,
summarizing and finds patterns in the data.
But now, What if number of students and their respective marks with respect to
subjects were given. Data in statistics needs to be classified according to how
many variables are in a particular study. Here if two variables are involved then
their frequency distribution is known as “Bivariate frequency distribution”.
Defintion
Back to Top
A Bivariate Frequency Distribution is the frequency distribution of two variables.

Let us discuss discuss the concept with the help of an example.
A backbone of Statistics is data collection and process and analysis over the data.
When small data is available for analysis then it is not of a problem:
For example: Marks achieved by a student out of 100 in all subjects. Find
percentage:
Mathematics =75
&
School of Law
Statistics = 90
English = 80
Physics = 75
Chemistry = 85
Data seems easy to operate, isn’t it?
But when large data is available,
For example:
Marks and number of students achieving respective marks in Mathematics from

class of 60 students: 40,50,80,90,56,60,40,77,78,92,95………….and so on.
Similarly, 60 values are available. Then it is easy to represent the data in tabular
format such as:
Marks (in
Frequency (Number of students achieved marks in that
interval)
interval)
0 - 10 0
11 - 20 1
&
School of Law
21 - 30 2
31 - 40 2
41 - 50 5
51 - 60 10
61 - 70 10
71 - 80 20
81 - 90 10
91 - 100 0
Total 60
This makes analysis much easier.
Following are types of bivariate analysis:
1. Scatter plot:
In scatter plots, it is possible to get idea about relationship between both variables
in a glance. In scatter plot, points are plotted on X and Y axis. The one which is
dependent variable is taken on Y axis and independent is taken on X axis. The
scatter plot looks as follows:
&
School of Law
2. Regression analysis:
Regression analysis allows to estimate future trends of data. It identifies data,
allows to fit that in one linear line and then by substituting values of independent
variables, future values of dependent variables can be easily found. It also gives
knowledge of slope and intercepts of line and hence can be tested for whole
population of that sample.
3. Correlation coefficients:
Correlation coefficient indicates how much two variables are related to each other.
Steps and calculations to be performed are shown below. Value for correlation is
always between -1 and 1. Basically -1 means there is perfect negative correlation
and 1 stands for perfect positive correlation. Where value of correlation coefficient
is zero indicates, no relationship between x and y at all. [Negative relationship is
when one variable increased, other has to decrease. And Positive indicates, when
one variable increases, other has to increase.]
Bivariate Frequency Distribution and correlation

Back to Top
If given data has numerical values on both sides, and it is required to recognize,
how much they are related to each other. It such cases, there is a way to find out if
there is correlation between 2 variables or if they are related to each other, if yes,
how much. Using “correlation coefficient(r)”
Consider given table,
XIYJXIYJ Y1Y1 Y2Y2 . . YkYk TOTAL
X1X1 A12A12 A12A12 . . A1kA1k T1T1

&
School of Law
X2X2 A21A21 A22A22 . . . .
. . . . . . .
. . . . . . .
XNXN AN1AN1 AN2AN2 . . ANKANK TNTN
TOTAL S1S1 S2S2 . . SKSK G
In above table, let X1X1, X2……..XNX2……..XN are NN values of XX.
Total of respective frequencies in rows be,
T1T1, T2……TNT2……TN are sum of respective rows.
Similarly, another variable is YY, Y1Y1, Y2….YkY2….Yk are kk values

of YY with respective frequency total, S1S1, S2….SkS2….Sk.
GG is Grand total of all frequencies.
Now, how to calculate correlation coefficient for these XX and YY values:
Formula used to calculate correlation coefficient,
rr = Covariance(X,Y)(σx × σy)Covariance(X,Y)(σx × σy)
Where,
Covariance (X, Y)(X, Y) = E[XY] – E[X]×E[Y]E[XY] – E[X]×E[Y]

&
School of Law
σxσx is standard deviation of xx.
And σyσy is standard deviation of yy.
Some intermediate calculations:
E[X]E[X] = x̅x̅ = 1∑Ti1∑Ti×∑Xi ×Ti×∑Xi ×Ti
E[Y]E[Y] = y̅y̅ = 1∑Si1∑Si×∑Yi×Si×∑Yi×Si
E[XY]E[XY] = 1G1G×∑∑Xi×Yi×Aij×∑∑Xi×Yi×Aij
σxσx = 1∑Ti×Ti(Xi−x̅)2−−−−−−−−−−−−−−−√1∑Ti×Ti(Xi−x̅)2
σyσy = 1∑Si×Ti(Yi−y̅)2−−−−−−−−−−−−−−−√1∑Si×Ti(Yi−y̅)2
Bivariate Frequency Distribution Table

Back to Top
Table which represents variable and their respective frequencies is known

as, Bivariate Frequency distribution table or joint frequency distribution table. This
is very much useful in real life. When we have two variables (X and Y) and they
&
School of Law
are related then we may perform bivariate analysis on them to find out their
relationship.
For example:
1. If a person needs to check height with weight.

2. Student’s grade with hours spent on studies
Standard Error of the Estimate (1 of 3)
The standard error of the estimate is a measure of the accuracy of predictions made
with a regression line. Consider the following data.
The second column (Y) is predicted by the first column (X). The slope and Y
intercept of the regression line are 3.2716 and 7.1526 respectively. The third
column, (Y'), contains the predictions and is computed according to the formula:
Y' = 3.2716X + 7.1526.
The fourth column (Y-Y') is the error of prediction. It is simply the difference
between what a subject's actual score was (Y) and what the predicted score is (Y').
&
School of Law
The sum of the errors of prediction is zero. The last column, (Y-Y')², contains the
squared errors of prediction.
The regression line seeks to minimize the sum of the squared errors of prediction.
The square root of the average squared error of prediction is used as a measure of
the accuracy of prediction. This measure is called the standard error of the estimate
and is designated as σest. The formula for the standard error of the estimate is:
where N is the number of pairs of (X,Y) points. For this example, the sum of the
squared errors of prediction (the numerator) is 70.77 and the number of pairs is 12.
The standard error of the estimate is therefore equal to:
An alternate formula for the standard error of the estimate is:
where is σy is the population standard deviation of Y and ρ is the

populationcorrelation between X and Y. For this example,
One typically does not know the population

parameters and therefore has to estimate from a sample. The symbol s est is used for
the estimate of σest. The relevant formulas are:
&
School of Law
and
where r is the sample correlation and Sy is the sample standard deviation of Y.

(Note that Sy has a capital rather than a small "S" so it is computed with N in the
denominator). The similarity between the standard error of the estimate and the
standard deviation should be noted: The standard deviation is the square root of the
average squared deviation from the mean; the standard error of the estimate is the
square root of the average squared deviation from the regression line. Both
statistics are measures of unexplained variation.
Time Series
A time series is a series of data points indexed (or listed or graphed) in time order.
Most commonly, a time series is a sequence taken at successive equally spaced
points in time. Thus it is a sequence of discrete-time data. Examples of time series
are heights of ocean tides, counts of sunspots, and the daily closing value of
the Dow Jones Industrial Average.
Time series are very frequently plotted via line charts. Time series are used
in statistics, signal processing, pattern recognition, econometrics, mathematical
finance, weather forecasting, earthquake
prediction, electroencephalography, control
engineering, astronomy, communications engineering, and largely in any domain
of applied science and engineering which involves temporalmeasurements.
Time series analysis comprises methods for analyzing time series data in order to
extract meaningful statistics and other characteristics of the data. Time
series forecasting is the use of a model to predict future values based on
&
School of Law
previously observed values. While regression analysis is often employed in such a

way as to test theories that the current values of one or more independent time
series affect the current value of another time series, this type of analysis of time
series is not called "time series analysis", which focuses on comparing values of a
single time series or multiple dependent time series at different points in
time.[1] Interrupted time series analysis is the analysis of interventions on a single
time series.
Time series data have a natural temporal ordering. This makes time series analysis
distinct from cross-sectional studies, in which there is no natural ordering of the
observations (e.g. explaining people's wages by reference to their respective
education levels, where the individuals' data could be entered in any order). Time
series analysis is also distinct from spatial data analysis where the observations
typically relate to geographical locations (e.g. accounting for house prices by the
location as well as the intrinsic characteristics of the houses). A stochastic model
for a time series will generally reflect the fact that observations close together in
time will be more closely related than observations further apart. In addition, time
series models will often make use of the natural one-way ordering of time so that
values for a given period will be expressed as deriving in some way from past
values, rather than from future values (see time reversibility.)
Time series analysis can be applied to real-valued, continuous
data, discrete numeric data, or discrete symbolic data (i.e. sequences of characters,
such as letters and words in the English language[2]).
Objectives of Time Series analysis
There are many objectives related to time series analysis, objectives of time series
analysis may be classified as
1. Description
2. Explanation
3. Prediction
4. Control
&
School of Law
The description of the objectives of time series analysis are as follows:

Description
The first step in the analysis is to plot the data and obtain simple descriptive
measures (such as plotting data, looking for trends, seasonal fluctuations and so
on) of the main properties of the series. In

above figure , there is a regular seasonal pattern of price change although this price
pattern is not consistent. Graph enables to look for “wild” observations or outlier
(not appear to be consistent with the rest of the data). Graphing the time
series make possible the presence of turning points where upward trend suddenly
changed to a downward trend. If there is turning point, different models may have
to be fitted to the two parts of the series.
Explanation
Observations taken on two or more variables, making possible to use the variation
in one time series to explain the variation in another series. This may lead to
deeper understanding. Multiple regression model may be helpful in this case.
Prediction
Given an observed time series, one may want to predict the future values of the
series. It is an important task in sales of forecasting and is the analysis of economic
and industrial time series. Prediction and forecasting used interchangeably.
Control
When time series generated to measure the quality of a manufacturing process (the
aim may be) to control the process. Control procedures are of several different
kinds. In quality control, the observations are plotted on control chart and the
&
School of Law
controller takes action as a result of studying the charts. A stochastic model is

fitted to the series. Future values of the series are predicted and then the input
process variables are adjusted so as to keep the process on target.
The factors that are responsible for bringing about changes in a time series,
also called the components of time series, are as follows:
1. Secular Trends (or General Trends)

2. Seasonal Movements
3. Cyclical Movements
4. Irregular Fluctuations
Secular Trends
The secular trend is the main component of a time series which results from long
term effects of socio-economic and political factors. This trend may show the
growth or decline in a time series over a long period. This is the type of tendency
which continues to persist for a very long period. Prices and export and import
data, for example, reflect obviously increasing tendencies over time.
Seasonal Trends
These are short term movements occurring in data due to seasonal factors. The
short term is generally considered as a period in which changes occur in a time
series with variations in weather or festivities. For example, it is commonly
observed that the consumption of ice-cream during summer is generally high and
hence an ice-cream dealer's sales would be higher in some months of the year
while relatively lower during winter months. Employment, output, exports, etc.,
are subject to change due to variations in weather. Similarly, the sale of garments,
umbrellas, greeting cards and fire-works are subject to large variations during
festivals like Valentine’s Day, Eid, Christmas, New Year's, etc. These types of
&
School of Law
variations in a time series are isolated only when the series is provided biannually,
quarterly or monthly.
Cyclic Movements
These are long term oscillations occurring in a time series. These oscillations are
mostly observed in economics data and the periods of such oscillations are
generally extended from five to twelve years or more. These oscillations are
associated with the well known business cycles. These cyclic movements can be
studied provided a long series of measurements, free from irregular fluctuations, is
available.
Irregular Fluctuations
These are sudden changes occurring in a time series which are unlikely to be
repeated. They are components of a time series which cannot be explained by
trends, seasonal or cyclic movements. These variations are sometimes called
residual or random components. These variations, though accidental in nature, can
cause a continual change in the trends, seasonal and cyclical oscillations during the
forthcoming period. Floods, fires, earthquakes, revolutions, epidemics, strikes etc.,
are the root causes of such irregularities.
Mathematical Statement of the Composition of Time Series

A time series may not be affected by all type of variations. Some of these type of
variations may affect a few time series, while the other series may be effected by
all of them. Hence, in analysing time series, these effects are isolated. In classical
time series analysis it is assumed that any given observation is made up of trend,
seasonal, cyclical and irregular movements and these four components have
multiplicative relationship.
Symbolically :
&
School of Law
O = T × S × C × I
where O refers to original data,
T refers to trend.
S refers to seasonal variations,
C refers to cyclical variations and
I refers lo irregular variations.
This is the most commonly used model in the decomposition of time series.
There is another model called Additive model in which a particular observation in
a time series is the sum of these four components.
O=T+S+C+I
To prevent confusion between the two models, it should be made clear that in
Multiplicative model S, C, and I are indices expressed as decimal percents whereas
in Additive model S, C and I are quantitative deviations about trend that can be
expressed as seasonal, cyclical and irregular in nature. If in a multiplicative model.
T = 500, S = 1.4, C = 1.20 and I = 0.7 then
O = T × S × C × I
By substituting the values we get
O = 500 × 1.4 × 1.20 × 0.7 = 608
In additive model, T = 500, S = 100, C = 25, I = –50
O = 500 + 100 + 25 – 50 = 575
The assumption underlying the two schemes of analysis is that whereas there is no
interaction among the different constituents or components under the additive
scheme, such interaction is very much present in the multiplicative scheme. Time
series analysis, generally, proceed on the assumption of multiplicative formulation.
Methods of Measuring Trend
Trend can be determined : (i) Free hand curve method ; (ii) moving averages
method ; (iii) semiaverages method; and (iv) least-squares method. Each of these
methods is described below :
(i) Freehand Curve Method : The term freehand is used to any non-mathematical
curve in statistical analysis even if it is drawn with the aid of drafting instruments.
This is the simplest method of studying trend of a time series. The procedure for
drawing free hand curve is an follows :
&
School of Law
(i) The original data are first plotted on a graph paper.

(ii) The direction of the plotted data is carefully observed.
(iii) A smooth line is drawn through the plotted points.
While fitting a trend line by the freehand method, an attempt should be made that
the fitted curve conforms to these conditions.
(i) The curve should be smooth either a straight line or a combination of long
gradual curves.
(ii) The trend line or curve should be drawn through the graph of the data in such a
way that the areas below and above the trend line are equal to each other.
(iii) The vertical deviations of the data above the trend line must equal to the
deviations below the line.
(iv) Sum of the squares of the vertical deviations of the observations from the trend
should be minimum.
Illustration : Draw a time series graph relating to the following data and fit the
trend by freehand method :
Year Production of Steel
(million tonnes)
1994 20
1995 22
1996 30
1997 28
1998 32
1999 25
2000 29
2001 35
2002 40
2003 32
The trend line drawn by the freehand method can be extended to project future
values. However, the freehand curve fitting is too subjective and should not be
used as a basis for prediction. Method of Moving Averages : The moving average
is a simple and flexible process of trend measurement which is quite accurate
under certain conditions. This method establishes a trend by means of a series of
&
School of Law
averages covering overlapping periods of the data.

The process of successively averaging, say, three years data, and establishing each
average as the moving-average value of the central year in the group, should be
carried throughout the entire series. For a five-item, seven-item or other moving
averages, the same procedure is followed : the average obtained each time being
considered as representative of the middle period of the group.
The choice of a 5-year, 7-year, 9-year, or other moving average is determined by
the length of period necessary to eliminate the effects of the business cycle and
erratic fluctuations. A good trend must be free from such movements, and if there
is any definite periodicity to the cycle, it is well to have the moving average to
cover one cycle period. Ordinarily, the necessary periods will range between three
and ten years for general business series but even longer periods are required for
certain industries.
In the preceding discussion, the moving averages of odd number of years were
representatives of the middle years. If the moving average covers an even number
of years, each average will still be representative of the midpoint of the period
covered, but this mid-point will fall halfway between the two middle years. In the
case of a four-year moving average, for instance each average represents a point
halfway between the second and third years . In such a case, a second moving
average may be used to ‘recentre’ the averages.
That is, if the first moving averages gives averages centering half-way between the
years, a further two-point moving average will recentre the data exactly on the
years.
This method, however, is valuable in approximating trends in a period of transition
when the mathematical lines or curves may be inadequate. This method provides a
basis for testing other types of trends, even though the data are not such as to
justify its use otherwise.
Illustration : Calculate 5-yearly moving average trend for the time series given
below.
Year : 1990 1991 1992 1993 1994 1995 1996 1997 1998
1999 2000
Quantity : 239 242 238 252 257 250 273 270 268
&
School of Law
288 284
Year : 2001 2002 2003 2004 2005 2006 2007 2008 2009
2010
Quantity : 282 300 303 298 313 317 309 329 333
327
Solution :
Year Quantity 5-yearly moving total 5-yearly moving average
1990 239
1991 242
1992 238 1228 245.6
1993 252 1239 247.8
1994 257 1270 254.0
1995 250 1302 260.4
1996 273 1318 263.6
1997 270 1349 269.8
1998 268 1383 276.6
1999 288 1392 278.4
1990 284 1422 284.4
2001 282 1457 291.4
2002 300 1467 293.4
2003 303 1496 299.2
2004 298 1531 306.2
2005 313 1540 308.0
2006 317 1566 313.2
2007 309 1601 320.2
2008 329 1615 323.0
2009 333
2010 327
To simplify calculation work: Obtain the total of first five years deta. Find out the
difference between the first and sixth term and add to the total to obtain the total of
second to sixth term. In this way the difference between the term to be omitted and
&
School of Law
the term to be included is added to the preceding total in order to obtain the next
successive total.
Illustration : Fit a trend line by the method of four-yearly moving average to the
following time series data.
Year : 1995 1996 1997 1998 1999 2000 2001 2002
Sugar production (lakh tons) : 5 6 7 7 6 8 9 10
Year : 2003 2004 2005 2006
Sugar production (lakh tons) : 9 10 11 11
Solution :
Remark : Observe carefully the placement of totals, averages between the lines.
Merits
1. This is a very simple method.
2. The element of flexibility is always present in this method as all the calculations
have not to be altered if same data is added. It only provides additional trend
values.
3. If there is a coincidence of the period of moving averages and the period of
cyclical fluctuations, the fluctuations automatically disappear.
4. The pattern of moving average is determined in the trend of data and remains
unaffected by the choice of method to be employed.
5. It can be put to utmost use in case of series having strikingly irregular trend.
Limitations
1. It is not possible to have a trend value for each and every year. As the period of
moving average increases, there is always an increase in the number of years for
which trend values cannot be calculated and known. For example, in a five yearly
moving average, trend value cannot be obtained for the first two years and last two
years, in a seven yearly moving average for the first three years and last three years
and so on. But usually values of the extreme years are of great interest.
2. There is no hard and fast rule for the selection of a period of moving average.
3. Forecasting is one of the leading objectives of trend analysis. But this objective
remains unfulfilled because moving average is not represented by a mathematical
&
School of Law
function.
4. Theoretically it is claimed that cyclical fluctuations are ironed out if period of
moving average coincide with period of cycle, but in practice cycles are not
perfectly periodic.
Trend by the Method of Semi-averages : This method can be used if a straight
line trend is to be obtained. Since the location of only two points is necessary to
obtain a straight line equation, it is obvious that we may select two representative
points and connect them by a straight line. Data are divided into two halves and an
average is obtained for each half. Each such average is shown against the mid-
point of the half period, we obtain two points on a graph paper. By joining these
points, a straight line trend is obtained.
The method is to be commended for its simplicity and used to some extent in
practical work. This method is also flexible, for it is permissible to select
representative periods to determine the two points. Unrepresentative years may be
ignored.
Method of Least Squares : If a straight line is fitted to the data it will serve as a
satisfactory trend, perhaps the most accurate method of fitting is that of least
squares. This method is designed to accomplish two results.
(i) The sum of the vertical deviations from the straight line must equal zero.
(ii) The sum of the squares of all deviations must be less than the sum of the
squares for any other conceivable straight line.
There will be many straight lines which can meet the first condition. Among all
different lines, only one line will satisfy the second condition. It is because of this
second condition that this method is known as the method of least squares. It may
be mentioned that a line fitted to satisfy the second condition, will automatically
satisfy the first condition.
The formula for a straight-line trend can most simply be expressed as
Yc = a + bX
where X represents time variable, Yc is the dependent variable for which trend
values are to be calculated and a and b are the constants of the straight tine to be
found by the method of least squares.
Constant is the Y-intercept. This is the difference between the point of the origin
(O) and the point of the trend line and Y-axis intersect. It shows the value of Y
&
School of Law
when X = 0, constant b indicates the slope which is the change in Y for each unit
change in X.
Let us assume that we are given observations of Y for n number of years. If we
wish to find the values of constants a and b in such a manner that the two
conditions laid down above are satisfied by the fitted equation.
Mathematical reasoning suggests that, to obtain the values of constants a and b
according to the Principle of Least Squares, we have to solve simultaneously the
following two equations.
∑Y = na + b∑Y ...(i)
∑XY = a∑X + b∑X2 ...(ii)
Solution of the two normal equations yield the following values for the constants a
and b :
b=
and a =
Least Squares Long Method : It makes use of the above mentioned two normal
equations without attempting to shift the time variable to convenient mid-year.
This method is illustrated by the following example.
Illustration : Fit a linear trend curve by the least-squares method to the following
data :
Year Production (Kg.)
2001 3
2002 5
2003 6
2004 6
2005 8
2006 10
2007 11
2008 12
2009 13
2010 15
&
School of Law
Solution : The first year 2001 is assumed to be 0, 2002 would become 1, 2003
would be 2 and so on. The various steps are outlined in the following table.
----------------------------------------------------
Year Production
Y X XY X2
1 2 3 4 5
----------------------------------------------------
2001 3 0 0 0
2002 5 1 5 1
2003 6 2 12 4
2004 6 3 18 9
2005 8 4 32 16
2006 10 5 50 25
2007 11 6 66 36
2008 12 7 84 49
2009 13 8 104 64
2010 15 9 135 11
Total 89 45 506 285
-----------------------------------------------------
The above table yields the following values for various terms mentioned below :
n = 10, ∑X = 45, ∑X2 = 285, ∑Y = 89, and ∑XY = 506
Substituting these values in the two normal equations, we obtain
89 = 10a + 45b ...(i)
506 = 45a + 285b ...(ii)
Multiplying equation (i) by 9 and equation (ii) by 2, we obtain
80l = 90a + 405b ...(iii)
1012 = 90a + 570b ...(iv)
Subtracting equation (iii) from equation (iv), we obtain
211 = 165b or b = 211/165 = 1.28
Substituting the value of b in equation (i), we obtain
89 = 10a + 45 × 1.28
&
School of Law
89 = 10a + 57.60
10a = 89 – 57.6
10a = 31.4
a = 31.4/10 = 3.14
Substituting these values of a and b in the linear equation, we obtain the following
trend line
Yc = 3. 14 + 1.28X
Inserting various values of X in this equation, we obtain the trend values as below :
-----------------------------------------------------------------
Year Observed Y bxX Yc (Col. 3 plus Col. 4)
1 2 3 4 5
-----------------------------------------------------------------
2001 3 3.14 1.28 × 0 3.14
2002 5 3.14 1.28 × 1 4.42
2003 6 3.14 1.28 × 2 5.70
2004 6 3.14 1.28 × 3 6.98
2005 8 3.14 1.28 × 4 8.26
2006 10 3.14 1.28 × 5 9.54
2007 11 3.14 1.28 × 6 10.82
2008 12 3.14 1.28 × 7 12.10
2009 13 3.14 1.28 × 8 13.38
2010 15 3.14 1.28 × 9 14.66
-------------------------------------------------------------------
Least Squares Method : We can take any other year as the origin, and for that
year X would be 0. Considerable saving of both time and effort is possible if the
origin is taken in the middle of the whole time span covered by the entire series.
The origin would than be located at the mean of the X values. Sum of the X values
would then equal 0. The two normal equations would then be simplified to
∑Y = Na ...(i)
or a =
&
School of Law
and ∑XY = b∑X2 or b = ...(ii)

Two cases of short cut method are given below. In the first case there are odd
number of years while in the second case the number of observations are even.
Illustration : Fit a straight line trend on the following data :
Year 1996 1997 1998 1999 2000 2001 2002 2003 2004
Y 4 7 7 8 9 11 13 14 17
Solution : Since we have 9 observations, therefore, the origin is taken at 2000 for
which X is assumed to be 0.
------------------------------
Year Y X XY X2
------------------------------
1996 4 – 4 – 16 16
1997 7 – 3 – 21 9
1998 7 – 2 – 14 4
1999 8 – 1 – 8 1
2000 9 0 0 0
2001 11 1 11 1
2002 13 2 26 4
2003 14 3 42 9
2004 17 4 68 16
-----------------------------
Total 90 0 88 60
------------------------------
Thus n = 9, SY = 90, SX = 0, SXY = 88, and SX2 = 60
Substituting these values in the two normal equations, we get
90 = 9a or a = 90/9 or a = 10
88 = 60 or b = 88/60 or b = 1.47
Trend equation is : Yc = 10 + 1.47 X
Inserting the various values of X, we obtain the trend values as below :
&
School of Law
Solution : Here there are two mid-years viz; 2006 and 2007. The mid-point of the
two years is assumed to be 0 and the time of six months is treated to be the unit.
On this basis the calculations are as shown below:
----------------------------------------------
Years Observed Y X XY X2
----------------------------------------------
2003 6.7 – 7 – 46.9 49
2004 5.3 – 5 – 26.5 25
2005 4.3 – 3 – 12.9 9
2006 6.1 – 1 – 6.1 1
2007 5.6 1 5.6 1
2008 7.9 3 23.7 9
2009 5.8 5 29.0 25
2010 6.1 7 42.7 49
&
School of Law
----------------------------------------------
Total 47.8 0 8.6 168
----------------------------------------------
From the above computations, we get the following values.
n = 8, ∑Y = 47.8, ∑X = 0, ∑XY = 8.6, ∑X2 = 168
Substituting these values in the two normal equations, we obtain
47.8 = 8a or a = 47.8/8 or a = 5.98 and 8.6 = 168 b or = 8.6/168 or b = 0.051
The equation for the trend line is : Yc = 5.98 + 0.051X
Trend values generated by this equation are below :
Second Degree Parabola

The simplest example of the non-linear trend is the second degree parabola, the
equation is written in the form :
2
Yc = a + bX + cX
When numerical values for a, b and c have been derived, the trend value for any
year may be
computed substituting in the equation the value of X for that year. The values of a,
b and c can be determined
by solving the following three normal equations simultaneously:
(i) ∑Y = Na + bSX + c∑X2
&
School of Law
(ii) ∑XY = a∑X + b∑X2 + c∑X3

(iii) ∑X2Y = a∑X2 + b∑X3 + c∑X4
Note that the first equation is merely the summation of the given function, the
second is the summation of X multiplied into the given function, and the third is
the summation of X2 multiplied into the given function.
When time origin is taken between two middle years SX would be zero. In that
case the equations are reduced to :
(i) ∑Y = Na + c∑X 2
(ii) ∑XY = b∑X2

(iii) ∑X2Y = a∑X2 + c∑X4
The value of b can now directly be obtained from equation (ii) and value of a and c
by solving equations (i) and (iii) simultaneously. Thus,
a=b=c=
Illustration : The price of a commodity during 2000 – 2005 is given below. Fit a
parabola Y = a + bX + cX2 to this data. Estimate the price of the commodity for the
year 2010 :
Year Price Year Price
2000 100 2003 140
2001 107 2004 181
2002 128 2005 192
Also plot the actual and trend values on graph.
Solution : To determine the value a, b and c, we solve the following normal
equations:
∑Y = Na + b∑X + c∑X2
∑XY = a∑X + b∑X +
2
c∑X3
∑X2Y = a∑X + b∑X + c∑X4
2 3
-----------------------------------------------------------------------------------
Year Y X X2 X3 X4 XY X2Y Yc
-----------------------------------------------------------------------------------
2000 100 – 2 4 – 8 16 – 200 400 97.744
&
School of Law
2001 107 – 1 1 – 1 1 – 107 107 110.426

2002 128 0 0 0 0 0 0 126.680
2003 140 +1 1 +1 1 +140 140 146.506
2004 181 +2 4 +8 16 + 362 724 169.904
2005 192 +3 9 +27 81 +576 1728 196.874
--------------------------------------------------------------------------------------
N = 6 ∑Y = 848 ∑X = 3 ∑X2 = 19 ∑X3 = 27 ∑X4 = 115 ∑XY = 771 ∑X2Y =
3099 ∑Yc = 848.134
--------------------------------------------------------------------------------------
848 = 6a + 3b + 19c ...(i)
771 = 3a +19b +27c ...(ii)
3,099 = 19a + 27b +115c ...(iii)
Solving Eqns. (i) and (ii), get
35b + 35c = 695 ...(iv)
Multiplying Eqn. (ii) by 19 and Eqn. (iii) by 3. Subtracting (iii) from (ii), we get
5352 = 280b + 168 c ...(v)
Solving Eqns. (iv) and (v), we get
c = 1.786
Substituting the value of c in Eqn. (iv), we get
b = 18.04 [35 b +(35 × 1.786) = 695]
Putting the value of b and c in Eqn. (i), we get
a = 126.68 [848 = 6a + (3 × 18.04) + (19 × 1.786))
Thus a = 126.68, b =18.04 and c = 1.786
Substituting the values in the equation
Yc = 126.68 + 18.04X + 1.786X2
When X = – 2, Y = 126.68 + 18.04(–2) + 1.786(– 2)2
= 126.68 – 36.08 + 7.144 = 97.744
When X = –1, Y = 126.68 + 18.04(–1) + 1.786(–1) 2
= 126.68 – 18.04 +1.786 = 110.426

When X
= 0, Y = 126.68
When X = l, Y = 126.68 + 18.04 + 1.786 = 146.506
When X = 2, Y = 126.68 + 18.04(2) + 1.786(2)2
&
School of Law
= 126.68 + 36.08 + 7.144 = 169.904

When X = 3, Y = 126.68 + 18.04(3) + 1.786(3)2
= 126.68 + 54.12 + 16.074 = 196.874
Price for 2010, Y = 126.68 + 18.04(8) + 1.786(8)2
When X = 8 = 126.68 + 144.32 + 114.304 = 385.304
Thus the likely price of the commodity for the year 2010 is Rs.385.304.
The graph of the actual trend values values is given below:
Conversion of Annual Trend Equation to Monthly Trend Equation
Fiting a trend line by least squares to monthly data may be excessively time
consuming. It is more convenient to compute the trend equation from annual data
and then convert this trend equation to a monthly trend equation.
There are two possible situations: (i) the Y units are annual totals, for example, the
total number of passenger cars sold; (ii) the Y units are monthly averages, for
example average monthly wholesale price Index.
Where Data are Annual Totals
A trend equation operative on an annual level is to be reduced to a monthly level.
Constant value, a, is expressed in terms of annual Y values. To express it in terms
of monthly values, we must divide it by 12. Similarly b is to be divided by 12 to
convert the annual change to a monthly change. But this division shows us only the
change for any month of two consecutive years, whereas we want change for two
consecutive months. Therefore b is to be divided by 12 once again. Consequently,
to convert annual trend equation to a monthly trend equation, when the annual data
are expressed as annual totals, we divide a by 12 and b by 144.
Where the Data are given as monthly averages per year
In this case, Y values are on a monthly level. Therefore, a value remains
unchanged in the conversion process. The b value in this case shows us the change
on a monthly level, but from a month in one year to the corresponding month in the
following year. Here, it is necessary only to convert b value to make it measure the
change between consecutive month by dividing it with 12 only.
Merits
(i) This method has no place for subjectivity since it is a mathematical method of
measuring trend,
&
School of Law
(ii) This method gives the line of best fit because from this line the sum of the
positive and negative deviations is zero and the total of the squares of these
deviations is minimum.
Limitations
The best practicable use of mathematical trends is for describing movements in
time series. It does not provide a clue to the causes of such movements. Therefore,
forecasting on this basis may be quite risky.
Forecasting will be valid if there is a functional relationship between the variable
under consideration and time for a particular trend. But if trend describes the past
behaviour, it hardly throws light on the causes which may influence the future
behaviour.
The other limitation is that if some items are added to the original data, a new
equation has to be obtained.
Curvilinear Trend
Sometimes, the time series may not be represented by a straight line trend. Such
trends are known as curvilinear trends. If the curvilinear trend is represented by a
straight line or semi-log paper, or by polynomials of second or higher degree or by
double logarithmic function, then the method of least squares is also applicable to
such cases.
MEASUREMENT OF SEASONAL VARIATIONS
Seasonal variations are those rhythmic changes in the time series data that occur
regularly each year. They have their origin in climatic or institutional factors that
affect either supply or demand or both. It is important that these variations be
measured accurately for three reasons. First, the investigator wants to eliminate
seasonal variations from the data he is studying. Second, a precise knowledge of
the seasonal pattern aid in planning future operations. Lastly, complete knowledge
of seasonal variations is of use to those who are trying to remove the cause of
seasonals or are attempting to mitigate the problem by diversification, off setting
opposing seasonal patterns, or some other means.
Since the number of calender days and working days vary from month to month,
therefore, it is essential to adjust the monthly figures if the same are based on daily
quantities, otherwise, there is no need for such adjustment when we deal with
either volume of inventories or of bank deposits because then the values are not
&
School of Law
influenced by the number of calender days or working days.

Methods of Measuring Seasonal Variations
1. Method of Simple Averages (Weekly, Monthly or Quarterly).
2. Ratio-to-Trend Method.
3. Ratio-to-Moving Average Method.
4. Link Relatives Method.
Methods of Simple Average
This is the simplest method of obtaining a seasonal index. The following steps are
necessary for calculating the index :
(i) Average the unadjusted date by years and months or quarters if quarterly data
are given.
(ii) Find totals of January, February etc.
(iii) Divide each total by the number of years for which data are given. For
example, if we are given monthly data for five years then we shall first obtain total
for each month for five years and divide each total by 5 to obtain an average.
(iv) Obtain an average of monthly averages by dividing the total of monthly
averages by 12.
(v) Taking the average of monthly average as 100, compute the percentage of
various monthly averages as follows:
Seasonal Index for January
=
If instead of the average of each month, the total of each month are obtained, we
will get the same result.
The following example shall illustrate the method.
Illustration : Consumption of monthly electric power in million of kilowat (k.w.)
hours for street lighting in India during 1999-2003 is given below:
&
School of Law
The above calculations are explained below:

(i) Column No. 7 gives the total for each month for five years.
(ii) In column No. 8 each total of column No. 7 has been divided by 5 to obtain an
average for each month.
(iii) The average of monthly averages is obtained by dividing the total of monthly
&
School of Law
averages by 12.
(iv) In column No. 9 each monthly average has been expressed as percentage of the
average of monthly averages. Thus, the percentage for January
=
Percentage for February =
If instead of monthly data, we are given weekly or quarterly data, we shall
compute weekly or quarterly averages by following the same procedure.
Ratio-to-moving average method : The method of monthly totals or monthly
averages does not give any consideration to the trend which may be present in the
data. The ratio-to-moving-average method is one of the simplest of the commonly
used devices for measuring seasonal variation which takes the trend into
consideration: The steps to compute seasonal variation are as follows :
(i) Arrange the unadjusted data by years and months.
(ii) Compute the trend values by the method of moving averages. For this purpose
take 12 month moving average followed by a two-month moving average to
recentre the trend values.
(iii) Express the data for each month as a percentage ratio of the corresponding
moving-average trend value.
(iv) Arrange these ratios by months and years.
(v) Aggregate the ratios for January, February etc.
(vi) Find the average ratio for each month.
(vii) Adjust the average monthly ratios found in step (vi) so that they will
themselves average 100 percent. These adjusted ratios will be the seasonal indices
for various months.
A seasonal index computed by the ratios-to-moving-average method ordinarily
does not fluctuate so much as the index based on straight-line trends. This is
because the 12-month moving average follows the cyclical course of the actual
data quite closely. Therefore the index ratios obtained by this method are often
more representative of the data from which they are obtained than is the case in the
ratio-to-trend method which will be discussed later on.
Illustration : Prepare a monthly seasonal index from the following data, using
moving averages method :
Monthly Sales of XYZ Products Co,. Ltd. (Rs.)
&
School of Law
Year
2000 2001 2002
January 3,639 3,913 4,393
February 3,591 3,856 4,530
March 3,326 3,714 4,287
April 3,469 3,820 4.405
May 3,321 3,647 4,024
June 3,320 3,498 3,992
July 3,205 3,476 3,795
August 3,205 3,354 3,492
September 3,255 3,594 3,571
October 3,550 3,830 3,923
November 3,771 4,183 3,984
December 3,772 4,482 3,880
&
School of Law
&
School of Law
Average of Monthly Averages 100.55

Putting average of monthly averages as 100, monthly averages have been admitted
to obtain seasonal index for each month.
&
School of Law
For example, Seasonal Index for January =

For February =
Merits
This method is more widely used in practice than other methods. The index
calculated by the ratioto- moving average method does not fluctuate very much.
The 12-month moving average follows the cyclical course of the actual data
closely. So index ratios are the true representative of the data from which they have
been obtained.
Limitations
All seasonal index numbers cannot be calculated for each month for which data is
available. When a four month average is taken, 2 months, in the beginning and 2
months in the end are left out for which we cannot calculate seasonal index
numbers.
The ratio-to-trend method : The ratio-to-trend method is similar to ratio-to-
moving-average method.
The only difference is the way of obtaining the trend values. Whereas in the ratio-
to-moving-average method, the trend values are obtained by the method of moving
averages, in the ratio-to-trend method, the corresponding trend is obtained by the
method of least sequares.
The steps in the calculation of seasonal variation are as follows :
(i) Arrange the unadjusted data by years and months.
(ii) Compute the trend values for each month with the help of least squares
equation.
(iii) Express the data for each month as a percentage ratio of the corresponding
trend value.
(iv) Aggregate the January’s ratios, February’s ratios, etc., computed previously
(v) Find the average ratio for each month.
(vi) Adjust the average ratios found in step (v) so that they will themselves average
100 per cent.
The last step gives us the seasonal index for each month.
Sometimes the median is used in place of the arithmetic average of the ratios-to-
trend. The choice depends upon circumstances but there is a preference for the
median if several erratic ratios are found. In fact, if a fairly large number of years,
&
School of Law
say, 20 or 15, are used in the computation, it is not uncommon to omit extremely
erratic ratios from the computation of average of monthly ratios. Only the
arithmetic average should be used for small number of years.
This method has the advantage of simplicity and case of interpretation. Although it
makes allowance for the trend, it may be influenced by errors in the calculation of
the trend. The method may also be influenced by cyclical and erratic influences.
This source of possible error is eliminated by the selection of a period of time in
which depression is offset by prosperity.
Illustration : Find seasonal variations by the ratio-to-trend method from the
following data :
Year 1st Quarter 2nd Quarter 3rd Quarter 4th Quarter
2000 30 40 36 34
2001 34 52 40 44
2002 40 58 54 48
2003 54 76 68 62
2004 80 92 86 82
Solution : For finding out seasonal variations by ratio-to-trend method, first the
trend for yearly data will be obtained and convert them into quarterly data.
Average 92.78 118.28 102.92 89.12
The average of quarterly average of trend figures :
Quarterly seasonal Index for 1st Quarter :
Quarterly seasonal Index for 2rd Quarter :
Quarterly seasonal Index for 3rd Quarter :
Quarterly seasonal Index for 4th Quarter :
The total of seasonal indices should be equal to 400 and that for monthly indices
should be 1200.
Merits
(i) This method is based on a logical procedure for measuring seasonal variations.
This procedure has an advantage over the moving average method for it has a ratio
to trend value for each month for which data is available. So this method avoids
loss of data which is inherent in the case of moving averages. If the period of time
series is very short then the advantage becomes more prominent.
&
School of Law
(ii) It is a simple method.

(iii) It is easy to understand.
Limitations :
If the cyclical changes are very wide in the time series, the trend can never follow
the actual data, as closely as a 12-month moving average will follow, under the
ratio-to-trend method. There will be more bias in a seasonal index computed by
ratio to trend method.
4. Link Relatives Method
Among all the methods of measuring seasonal variation, link relatives method is
the most difficult one. When this method is adopted the following steps are taken
to calculate the seasonal variation indices :
(i) Calculate the link relatives of the seasonal figures. Link relatives are calculated
by dividing the figure of each season* by the figure of immediately preceding
season and multiplying it by 100.
These percentages are called link relatives since they link each month (or quarter
or other time period) to the preceding one.
(ii) Calculating the average of the link relatives for each season. While calculating
average we might take arithmetic average but median is probably better. The
arithmetic average would give undue weight to extreme cases which were not
primarily due to seasonal influences.
(iii) Convert these averages into chain relatives on the base of the first season.
(iv) Calculate the chain relatives of the first season on the basis of the last season.
There will be some difference between the chain relative of the first season and the
chain relative calculated by the previous method. This difference will be due to
long-term changes. It is therefore necessary to correct these chain relatives.
(v) For correction, the chain relative of the first season calculated by first method is
deducted from the chain relative (of the first season) calculated by the second
method. The difference is divided by the number of seasons. The resulting figure
multiplied by 1,2,3 (and so on) is deducted respectively from the chain relatives of
the 2nd, 3rd, 4th (and so on) seasons. These are correct chain relatives.
(vi) Express the corrected chain relatives as percentage of their averages. These
provide the required seasonal indices by the method of link relatives. The
following example will illustrate the process.
&
School of Law
The correction factor is calculated as follows :

Chain relative of the first quarter (on the basis of first quarter) = 100
Chain relative of the first quarter (on the basis of first quarter) =
Difference between these chain relatives = 106.7 – 100 = 6.7
Difference per quarter =
Adjusted chain relatives are obtained by subtracting 1 × 1.675, 2 × 1.675, 3 ×
1.675 from the chain relatives of 2nd , 3rd and 4th quarters, respectively.
Seasonal variation indices are calculated as below :
Seasonal variation index =
Meaning of “Normal” in Business Statistics
Business is often said to be “above normal” or “below normal”. When so used the
term “normal” is generally recognized to mean a level of activity which is
characterized by the presence of basic trend and seasonal variation. This implies
that the influence of business cycles and erratic fluctuations on the level of activity
is assumed to be insignificant. Therefore, the product of trend value for any period
when adjusted by the seasonal index for that period gives us an estimate of the
normal activity during that period.
Measuring Cycle as the residual
Business cyclical variations are measured either as the difference between the
observed value and the “normal”. Whatever remains after elimination of secular
trend and seasonal variations from the time series, is said to be composed of
cyclical variations and Irregular movements.
Second degree Parabola
The simplest form of the non-linear trend is the second degree parabola. It is used
to find long term trend. We use the following equation for finding second degree
trend –
Yc = a + bX + cX2
To know the value of a, b and c we use the following three normal equations –
∑Y = Na + b∑X + c∑X2
∑XY = a∑X + b∑X2 + c∑X3
∑X2Y = a∑X2 + b∑X3 + c∑X4
&
School of Law
A second degree trend equation is apporpriate for the secular trend component of a
time series when the data do not fall in a straight line.
Illustration: Fit a parabola (Yc = a + bX + cX2) from the following
Years 1 2 3 4 5 6 7
Values 35 38 40 42 36 39 45
&
School of Law
– 84c = – 4
c = 4/84 = 0.05
By substituting the value of c in equation (i) we get the value of a
&
School of Law
7a + 28 × 4/48 = 275
7a = 275 – 1.33
a = 273.67/7 = 39.09
We may get the value of b with the help of equation (ii)
28b = 28
b = 1
The required equation would be:
Yc = 39.09 + 1X + 0.05 X2
= 39.09 + X + 0.05 X2
With the help of above equation we can estimate the value for year 8 where x = 4
Yc = 39.09 + 4 + 0.05 (4)2
= 39.09 + 4 + 0.8 = 43.89
Exponential Trend
The equation for exponential trend is of the form: y = abx
Taking log of both sides we get log y = log a + x log b
To get the value of a and b we have normal equation
∑logy = Nlog a + logb ∑X
∑(x. log y) = log a∑x + log b∑X2
When we slove these equations we get –
log a = and log b =
Illustration : The production of certain raw material by a company in lakh tons for
the years 1996 to 2002 are given below:
Year : 1996 1997 1998 1999 2000 2001 2002
Production : 32 47 65 92 132 190 275
Estimate Production figure for the year 2003 using an equation of the form y = ab1
where x = years and y = production
Solution :
&
School of Law
log y = 1.9704 + .154 x

for 2003, x would be 4 and log y will be
log y = 1.9704 + .154(4) = 2.5864
y = AL 2.5864 = 385.9
Thus estimated production for 2003 would be 385.9 lakh tons.
Method of Least Squares

In Correlation we study the linear correlation between two random variables x and
y. We now look at the line in the xy plane that best fits the data (x1, y1), …, (xn, yn).
Recall that the equation for a straight line is y = bx + a, where
b= the slope of the line
a = y-intercept, i.e. the value of y where the line intersects with the y-axis
For our purposes we write the equation of the best fit line as
and so the y-intercept is
For each i, we define ŷi as the y-value of xi on this line, and so

&
School of Law
The best fit line is the line for which the sum of the distances between each of
the n data points and the line is as small as possible. A mathematically useful
approach is therefore to find the line with the property that the sum of the
following squares is minimum.
Theorem 1: The best fit line for the points (x1, y1), …, (xn, yn) is given by
where
Click here for the proof of Theorem 1. Two proofs are given, one of which does
not use calculus.
Definition 1: The best fit line is called the regression line.
Observation: The theorem shows that the regression line passes through the point
(x̄, ȳ) and has equation
where the slope is
and the y-intercept is
Note too that b = cov(x,y)/var(x). Since the terms involving n cancel out, this can
be viewed as either the population covariance and variance or the sample
covariance and variance. Thus a and b can be calculated in Excel as follows where
R1 = the array of y values and R2 = the array of x values:
b = SLOPE(R1, R2) = COVAR(R1, R2) / VARP(R2)
a = INTERCEPT(R1, R2) = AVERAGE(R1) – b * AVERAGE(R2)
Property 1:
Proof: By Definition 2 of Correlation,
and so by the above observation we have

&
School of Law
Excel Functions: Excel provides the following functions for forecasting the value
of y for any x based on the regression line. Here R1 = the array of y data values
and R2 = the array of x data values:
SLOPE(R1, R2) = slope of the regression line as described above
INTERCEPT(R1, R2) = y-intercept of the regression line as described above
FORECAST(x, R1, R2) calculates the predicted value y for the given value of x.
Thus FORECAST(x, R1, R2) = a + b * x where a = INTERCEPT(R1, R2) and b =
SLOPE(R1, R2).
TREND(R1, R2) = array function which produces an array of predicted y values
corresponding to x values stored in array R2, based on the regression line
calculated from x values stored in array R2 and y values stored in array R1.
TREND(R1, R2, R3) = array function which predicts the y values corresponding
to the x values in R3 based on the regression line based on the x values stored in
array R2 and y values stored in array R1.
To use TREND(R1, R2), highlight the range where you want to store the predicted
values of y. Then enter TREND and a left parenthesis. Next highlight the array of
observed values for y (array R1), enter a comma and highlight the array of
observed values for x (array R2) followed by a right parenthesis. Finally
press Crtl-Shft-Enter.
To use TREND(R1, R2, R3), highlight the range where you want to store the
predicted values of y. Then enter TREND and a left parenthesis. Next highlight the
array of observed values for y (array R1), enter a comma and highlight the array of
observed values for x(array R2) followed by another comma and highlight the
array R3 containing the values for x for which you want to predict y values based
on the regression line. Now enter a right parenthesis and press Crtl-Shft-Enter.
Excel 2016 Function: Excel 2016 introduces a new
function FORECAST.LINEAR, which is equivalent to FORECAST.
Example 1: Calculate the regression line for the data in Example 1 of One Sample
Hypothesis Testing for Correlation and plot the results.
&
School of Law
Figure 1 – Fitting a regression line to the data in Example 1

Using Theorem 1 and the observation following it, we can calculate the slope b and
y-intercept a of the regression line that best fits the data as in Figure 1 above.
Using Excel’s charting capabilities we can plot the scatter diagram for the data in
columns A and B above and then select Layout > Analysis|Trendline and choose
a Linear Trendline from the list of options. This will display the regression line
given by the equation y = bx + a(see Figure 1).

Bbacam 106 Notes PDF

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bbacam 106 Notes PDF

Uploaded by

Copyright:

Available Formats

Chanderprabhu Jain College of Higher Studies

Paper Code : 106

Subject : Business statistics and research methodology

such phenomena as are capable of being quantitatively measured and numerically

2. Bivariate Frequency Distribution.

are important. However, a histogram can be misleading if the distribution

Example : Draw a histogram from the following data :

(ii) more than method

of a whole, then we construct component or composite bar diagrams. Whenever

What is Central Tendency

Serial Salary (Rupees) Deviations from assumed mean

Solution : Calculation of Number of Accidents per Day

= , where AL refers to antilog.

Example : Calculate mode from the following data:

(a) Absolute Measures of Dispersion

2. The Quartile Deviation

Relative Measures of Dispersion

Coefficient of Range or Coefficient of Dispersion

Coefficient of Quartile Deviation or Quartile Coefficient of Dispersion

Coefficient of Mean Deviation or Mean Deviation of Dispersion

Coefficient of Standard Deviation or Standard Coefficient of Dispersion

Coefficient of Variation (a special case of Standard Coefficient of Dispersion)

In statistics, range is defined simply as the difference between the maximum

simply computed as 480-320 = 160 grams.

Some Limitations of Range

This limitation of range is to be expected primarily because range is computed

Referred to as average deviation, it is defined as the sum of the

Discrete Data Series

Continuous Data Series

Items 0-5 5-10 10-20 20-30 30-40

The Standard Deviation is a measure of how spread out numbers are.

But here we explain the formulas.

The symbol for Standard Deviation is σ (the Greek letter sigma).

This is the formula for Standard Deviation:

Say what? Please explain!

OK. Let us explain it step by step.

Say we have a bunch of numbers like 9, 2, 5, 4, 12, 7, 8, 11.

To calculate the standard deviation of those numbers:

• 1. Work out the Mean (the simple average of the numbers)

The Formula Explained

First, let us have some example values to work on:

Example: Sam has 20 Rose Bushes.

The number of flowers on each bush is

9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4

Work out the Standard Deviation.

Step 1. Work out the mean

Example: 9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4, 10, 9, 6, 9, 4

The mean is:

This is the part of the formula that says:

So what is xi ? They are the individual x values 9, 2, 5, 4, 12, 7, etc...

In other words x1 = 9, x2 = 2, x3 = 5, etc.

(12 - 7)2 = (5)2 = 25

... etc ...

And we get these results:

4, 25, 4, 9, 25, 0, 1, 16, 4, 16, 0, 9, 25, 4, 9, 9, 4, 1, 4, 9

Step 3. Then work out the mean of those squared differences.

First add up all the values from the previous step.

The handy Sigma Notation says to sum up as many terms as we want:

Which means: Sum all values from (x1-7)2 to (xN-7)2

Mean of squared differences = (1/20) × 178 = 8.9

(Note: this value is called the "Variance")

Step 4. Take the square root of that: