Professional Documents
Culture Documents
INTRODUCTION TO STATISTICS:
The word statistics is derived from the Latin word ‘Status’ or Italian word ‘Statistika’ or
the German word ‘Statistik’ which means a political state. The term statistics was applied to
mean facts and figures which were needed by the state in its day to day life. Statistics was
regarded as a by-product of administrative activities of the State. Statistics is a tool in solving or
analyzing the problem of the State.
The word “statistics” is used in two different senses –Plural and Singular. When used as
plural, statistics means numerical set of data and when used in singular sense, it means the
science of statistical methods embodying the theory and techniques used for collecting, analyzing
and drawing inferences from the numerical data.
STATISTICS DEFINITION:
➢ “Statistics is the science which deals with the collection, classification Presentation,
analysis and interpretation of numerical data.”
➢ “Statistics may be called the science of counting”. - A. L. Bowley
➢ “Statistics may rightly be called the science of averages”. - A. L. Bowley
➢ “Statistics is the science of estimates and probabilities”. - Boddington
CHARACTERISTICS OF STATISTICS:
✓ Statistics are aggregate of facts.
✓ Statistics are affected to a marked extent by a multiplicity of causes.
✓ Statistics are numerically expressed.
✓ Statistics should be enumerated or estimated.
✓ Statistics should be collected with reasonable standard of accuracy.
✓ Statistics should be collected in a systematic manner for a pre-determined purpose.
✓ Statistics should be placed in relation to each other.
A single accident is not statistics. But the total number of accident of a city during a
month is statistics.
1
currency, import, export, competition in market, consumer taste is not possible to single out one
cause.
For instance, when we collected the income data from rich people, ignoring the poor, it
will only inflate the national income data, the purpose of data collection must be decided in
advance, and the investigator must be aware for the purpose.
If the object is not known to the investigator, it is possible that he many collected
unnecessary data, which many not be of any use while ignoring necessary data. Thus, without a
pre-determined purpose, the collected data may not yield the desired results.
Statistical data are mostly collected for the purpose of comparison. In order to make
valid comparison the data should be homogeneous, i.e., they should relate to the same
phenomenon or subject. For instance, weights of the boys in a class are to be compared with the
corresponding weights of boys in another class. But it would be meaningless to compare the
height of the student with the height of trees.
2
FUNCTIONS OF STATISTICS:
It presents the data in definite form.
It studies relationship between the variable.
It simplifies the complex data.
It provides a technique of comparison.
It plays a significant role in forecasting and planning.
It helps in formulating policies in business, Industries (or) Government organization etc.,
LIMITATIONS OF STATISTICS:
1. Statistics does not deal with individual items:
Statistics deals with groups or aggregates only. The Scope of statistics lies outside
the study of individual fact.
2. Statistics deals with Quantitative data only :
Statistics is numerical statement of facts. Statistics deals with only the quantitative
data. For Example: Per Capital income, population growth etc., can be studied by
statistics; but qualitative aspects such as honesty, intelligence, poverty, etc., cannot be
studied directly.
Statistics simplifies complicated data. Before using the data, the background of
the data may be studied.
It is the most important limitations of statistics. Statisticians must know the use
and limitation of statistics. Only then they can make use of it to get fruitful results and
avoid dangerous, wrong and misleading results.
USES OF STATISTICS:
✓ It has a wide range of topics in biology.
✓ It has particular application to agriculture and medicine.
✓ It is used in design & analysis clinical trial in medicine.
✓ It is used in public health, services research nutrition & environmental health.
3
✓ It is used in genomics, Population genetics.
DATA:
✓ The raw material of statistics is data.
✓ It should be numerically expressed
✓ The raw material for statistics is data
DISCRETE VARIABLES:
Measurable (or) countable
They are obtained by enumeration, (i.e.) counting and are also called discontinuous.
Example: Number of children per family we can count the number of children in a family
as 0,1,2,3 and so on. But we cannot count 1.3 or 2.6 children per family.
CONTINUOUS VARIABLES:
Non measurable
Infinite number of values between any two fixed points.
Example: length of fish measured in cm to the nearest mm the length of fish can be
measured as 1.5, 1.6, 1.7cm and so on.
COLLECTION OF DATA:
It is the first step in a statistical enquiry
It is to be planned properly and executed properly
All the aspects of the survey ,starting from planning and ending and writing of the final
report are broadly classified in to two categories
1. Planning a survey
2. Executing a survey
PLANNING A SURVEY:
Purpose of a survey
Scope of a survey
Nature of information required
Sources of data
Accuracy aimed
EXECUTING A SURVEY:
Setting up an administrative organization
Designing of forms
Selecting, training and supervising the field investigators
Reducing non response
Presenting the information
Analyzing the information
Preparing the reports
4
SOURCES OF COLLECTING DATA:
Primary:
Primary data are those statistical data which are collected for the first time and are
original in nature.
Primary data are those which are collected from the individual directly and these data
have never been used for and purpose earlier.
Merits:
1. Data are originally collected
2. True and reliable data
3. Higher degree of accuracy
4. Uniformity and homogeneity can be maintained
Demerits:
1. It is unsuitable when the area are large
2. It is expensive and time consuming
5
INDIRECT ORAL INVESTIGATIONS:
Under this method the investigator contacts witness (or) neibours (or) friends who are
capable of supplying the necessary information.
This method is preferred if the required information is on addiction or cause of fire or
theft or murder.
For e.g, an alcohol addict may not willingly give information on how the habit
started, the quantity of his daily consumption, how he feels with and without alcohol.
He may confide to his friend or to his doctor and not to a social worker who is
collecting the information.
Merits:
1. It is simple and conventional
2. It saves time and money
3. It can be used in the investigation of a large area
Demerits:
1. The information can be relied
2. Interview with improper man will spoil the result
3. The careless attitude of the informant
4. Will affect the degree of accuracy
Under this method, local agents or correspondents will be appointed. They collect the
information and transmit it to the office or person. This system is adopted by newspapers,
periodicals, agencies, etc., when information is needed in different fields.
Merits:
1. It is relatively cheap
2. Requires less time
Demerits:
1. Local agents and correspondents are not likely to be serious and careful.
6
Merits:
1. It is relatively cheap
2. It is widely used when the area of investigation is large
3. It saves money and time
Demerits:
1. In this method there is no direct contact between the investigator and the respondent.
therefore we cannot be sure about the accuracy and reliability of the data
2. This method is suitable only for the literate people
3. people may not give the correct answers
QUESTIONNAIRE:
➢ It is the statistical information which has already been collected by someone for his own.
Purpose and available for use by other purpose. (Or)
➢ If the data have already been collected by some persons (or) institution and they are made
available for statistical investigation is known as secondary data.
SOURCES OF SECONDARY DATA
Published Sources
Un published sources
1. Published Sources:
Various governmental, International and local agencies publish statistical data, and chief
among them are:
2. Unpublished Sources:
There are various sources of unpublished data. They are the records maintained by
various government and private offices, the researches carried out by individual research
scholars in the universities.
LIMITATIONS OF STATISTICS:
8
Statistics does not reveal the entire story.
Statistical data should be uniform and homogeneous.
Statistics is liable to be misused.
DEFINITION:
• The process of arranging data into groups according to some common characteristics
• The process of arranging (or) grouping a large no of individual facts (or) observation on
the basics of similarity among the items is called classification.
Geographical
Quantitative
Classificatio Chronologic
n al
Qualitative
9
Geographical (or) spatial classification:
The classification is based on place (or) region such as states, towns, city, and village.
Number of cancer affected
Region
persons
Tamil nadu …………..
Kerala ………….
Maharastra ………….
Delhi …………….
kolkatta …………..
Qualitative:
10
Simple classification or one way classification
One way classification means classification of data on the basic of only one consideration
this is based on only one quality.
If the data are classified into only two classes, such as literate and illiterate or honest and
dishonest or skilled and unskilled, the classification is termed as simple classification.
For example:
Population Population
MANIFOLD CLASSIFICATION:
This is based on more than one quantity. For example, the college students can be
classified on the basis of three attributes sex, subject of study and religion as follows.
Manifold classification: In manifold classification, the universe is classified on the basis
of more than one attribute at a time.
Population
Male Female
M UM M UM M UM M UM
11
QUANTITATIVE CLASSIFICATION:
FREQUENCY DISTRIBUTION:
It is simply a table in which the data are grouped into classes and the number of cases which fall
in each class is recorded. Frequency distribution can be two kinds:
INDIVIDUAL DATA:
For some statistical calculations, the series of individual observation are to be arranged in
either ascending or descending order. This is called as array.
Marks: 40 33 27 38 41 48 44 51 39 35.
Arraying
Observed Values
Ascending Order Descending Order
40 27 55
33 33 51
27 38 48
38 39 44
41 40 41
48 41 40
44 44 39
51 48 38
39 51 33
35 55 27
12
DISCRETE FREQUENCY DISTRIBUTION:
Here each class is distinct and separate from the other classes. We have to count the number
of times each value of the variable is repeated in the data and it is called the frequency of that
class.
9 7 5 3 4 8 6 0 6 5 9 1 7 2 3 8 6 8 7 4 9 4 5 10 5 9 6 9 5 6
RESULT:
Marks 0 1 2 3 4 5 6 7 8 9 10
No of student 1 1 1 2 3 5 5 3 3 5 1
Continuous series is one where measurements are only approximations and are expressed
in class intervals. Collection of items, which cannot be exactly measured, but placed within
certain limits, is called continuous series.
Class limits:
The class – limits are the smallest or the lowest and the largest or the highest values in the
class. For example take the class 10-20. The lowest value is 10 and the highest value is 20. The
two boundaries of the class are known as the lower limit and the upper limit of the class. Class
limits is also known as class boundaries.
13
Class intervals:
The difference between the lower limit and the upper limit of the class is known as the
class-interval; for example in the class 10-20 the class interval is 10. (i.e., 20 -10).
78 25 25 50 30 29 55 52 43 43
44 20 48 44 43 58 36 46 48 47
56 60 31 47 53 65 68 73 59 12
34 74 79 20 16 70 65 39 60 45
60 20 47 49 51 38 49 35 52 61
14
Continuous Table:
Marks 10- 15- 20- 25- 30- 35- 40- 45- 50- 55- 60- 65- 70- 75-
14 19 24 29 34 39 44 49 54 59 64 69 74 79
f 1 1 3 3 3 4 5 9 5 4 4 3 3 2
TABULATION:
Tabulation is the process of arranging data systematically in rows and columns of a table.
It is designed to simplify presentation and facilitates comparison and analysis.
OBJECTS:
Large and complex data can be presented in a neat and compact form.
Nature of the data can be easily understood.
Much of the time which is otherwise necessary to look of the data is saved.
The data are so pleased in a table that proper comparison is possible easier.
A table facilities further analysis of data.
A table is the convenient form for diagrammatic representation of data.
Voluminous data can be presented in a small space.
It remains a permanent record and enables ready reference.
Sometimes omissions and errors can be detected.
PARTS OF A TABLE:
Table number
Title
Prefatory note or head note
Stubs
Captions
Body of the table
Foot-notes
Source notes.
TABLE NUMBER:
A table should always be numbered for easy identification and reference in future .The
table number may be placed at the top of the table either in the centre above the title or in the left
side of the table.
TITLE:
Each table should be given a suitable title. It must describe the contents of the table.
It is a statement, given below the title and enclosed in brackets. For example, unit of
measurement such as crores of rupees.
15
STUBS:
These are the row headings. These constitute the first column and explain what the rows
are about.
CAPTIONS:
These are the column headings. These tell what the columns are about. There can be sub
headings.
Stub Body
Entries
16
DIFFERENCE BETWEEN CLASSIFICATION AND TABULATION:
CLASSIFICATION TABULATION
This is the process of dividing the data This is the process of arranging the classified data
into homogeneous subgroups systematically in rows and columns of a table
This condenses the mass of data and This provides the data a readily referable and
facilitates to grasp the nature. almost permanent form.
A measure of central tendency gives a single representative value for a set of unequal
values. The measures of central tendency are known as ‘Measures of location’. They are
popularly called averages. Various measures of central tendency are the following.
Arithmetic mean
Median
Mode
Geometric mean
Harmonic mean
ARITHMETIC MEAN:
17
Merits of Arithmetic mean:
It is easy to understand.
It is easy to calculate.
It is used in further calculation.
It is easy to understand.
It is used in further calculation.
It is rigidly defined.
It is based on the value of every time in the series.
It provides a good basis for comparison.
It can be used for further analysis and algebraic treatment.
The mean is a more stable measure of central tendency.
The arithmetic average is not indefinite.
Family A B C D E F G H I J
Expenditure 30 70 10 75 500 8 42 250 40 36
Calculate the Arithmetic mean.
18
Solution:
x 1061
X= = = 106.1
N 10
DISCRETE SERIES:
EXAMPLE 1:
No. of persons: 2 3 4 5 6
No. of houses: 10 25 30 25 10
Solution:
x f Fx
2 10 20
3 25 75
4 30 120
5 25 125
6 10 60
f = 100 fx = 400
fx 400
X= = =4
f 100
CONTINUOUS SERIES:
Marks: 20 - 30 30 - 40 40 - 50 50 - 60 60 - 70 70 - 80
No. of students: 5 8 12 15 6 4
19
Solution:
x F m fm
20 – 30 5 25 125
30 – 40 8 35 280
40 – 50 12 45 540
50 – 60 15 55 825
60 – 70 6 65 390
70 – 80 4 75 300
Total f = 50 fm = 2460
fm 2460
X= = = 49.2
f 50
Example 2: The annual profits of 90 companies are given below. Find the arithmetic mean
(Inclusive method)
Annual profit(Rs.lakhs) 0 – 19 20 – 39 40 – 59 60 – 79 80 – 99
No.of companies 5 17 32 24 12
Solution:
No.of
Annual True class Mid value
companies fm
profit(Rs. lakhs) interval m
F
0 – 19 -0.5 – 19.5 5 9.5 47.5
20 – 39 19.5 – 39.5 17 29.5 501.5
40 – 59 39.5 – 59. 5 32 49.5 1584.0
60 – 79 59.5 – 79.5 24 69.5 1668.0
80 – 99 79.5 – 99.5 12 89.5 1074.0
Total ∑ 𝒇 = 𝟗𝟎 ∑ 𝒇𝒎 = 𝟒𝟖𝟕𝟓. 𝟎
fm 4875
Arithmetic mean X = = = 54.17
f 90
X = Rs.54.17
Value Frequency
Less than 10 4
Less than 20 10
Less than 30 15
Less than 40 25
Less than 50 30
Less than 60 35
Less than 70 45
Less than 80 65
20
Solution: In this problem cumulative frequencies and classes are given. We Will first convert the
data in simple series from the given cumulative frequencies. After this, the calculation of mean is
done. This is illustrated below:
∑ 𝑓𝑚
𝑥̅ =
∑𝑓
3235
=
65
𝑥̅ = 49.77
Example 4: From the following information pertaining to 150 workers. Calculate average wage
paid to workers.
Solution: There are no workers who received wages less than Rs.75. The lower limit of the first
class is 75. The class interval would be 75 – 85, 85 – 90 and so on.
21
No .of workers
Wages X m fm
f
75 – 85 80 150 – 140 = 10 800
85 – 95 90 140 – 115 = 25 2250
95 – 105 100 115 – 95 = 20 2000
105 – 115 110 95 – 70 =15 1650
115 – 125 120 70 – 60 = 10 1200
125 – 135 130 60 – 40 = 20 2600
135 – 145 140 40 – 25 = 15 2100
145 – 155 150 25 3750
∑ 𝒇 = 𝟏𝟒𝟎 ∑ 𝒇𝒎 = 𝟏𝟔𝟑𝟓𝟎
∑ 𝑓𝑚 16350
𝑥̅ = =
∑𝑓 140
x = 116.79
Class interval 50 – 59 40 – 49 30 – 39 20 – 29 10 – 19 0 – 9
Frequency 1 3 9 10 15 2
Solution:
Convert inclusive to exclusive method by subtracting lower limit 0.5 and adding upper
limit 0.5. Then exclusive class interval series is (49.5 – 59.5, 39.5 – 49.5 and so on) nor to
arrange the data in ascending order, beginning with 0 – 9.
∑ 𝑓𝑚 970
𝑥̅ = ∑𝑓
= 40
𝑥̅ = 24.25
22
DEFINITION: MEDIAN:
Median is the value of the middle most items when all the items are in the order of
magnitude.
MEDIAN:
Merits of median:
Demerits of median:
Characteristics of Median:
Unlike the mean, the median can be computed from open-ended distribution.
In case of qualitative data where the items are not counted or measured but are
scored or ranked, it is the most appropriate measure of central tendency.
The median can be determined graphically whereas mean cannot found out.
Example 1: The following are the marks scored by 7 students; find out the median marks:
Roll no: 1 2 3 4 5 6 7
Marks: 45 32 18 57 65 28 46
23
Solution:
R. No Marks R. No Marks
1 45 3 18
2 32 6 28
3 18 2 32
4 57 1 45
5 65 4 57
6 28 7 58
7 46 5 65
𝑁+1 𝑡ℎ
Median = Size of ( ) item
2
7+1 𝑡ℎ
= Size .of ( ) item
2
Median = 45
57 58 61 42 38 65 72 66
Solution:
SI. No Values
1 38
2 42
3 57
4 58
5 61
6 65
7 66
8 72
𝑁+1 𝑡ℎ
Median = Size. of ( ) 𝑖𝑡𝑒𝑚
2
8+1 𝑡ℎ
= Size. of ( ) 𝑖𝑡𝑒𝑚
2
= 4.5th item
58+61
= = 59.5
2
24
Example 3: Discrete series:
Size of shoes f cf
5 10 10
5.5 16 26
6 28 54
6.5 15 69
7 30 99
7.5 40 139
8 34 173
𝑁+1 𝑡ℎ
Median = S. of ( ) 𝑖𝑡𝑒𝑚
2
173+1 𝑡ℎ
= S. of ( ) 𝑖𝑡𝑒𝑚
2
= 87th item
Marks: 10 - 25 25 - 40 40 - 55 55 - 70 70 - 85 85 -100
Frequency: 6 20 44 26 3 1
𝑁 100
Median item = = = 50
2 2
25
50−26
= 40 + ×5
44
= 40 + 8.18
= 48.18 marks
value 0 – 9 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59 60 – 69
Frequency 328 720 664 598 524 378 244
Solution:
=1
1
Half of the difference = 2 = 0.5
0.5 has been added to each upper limit and 0.5 has been subtracted from each lower limit to get
the boundaries of the true class intervals. It is the required form for the calculation of median.
𝑁 3456
= = 1728. Hence, the median class interval is 29.5 – 39.5
2 2
(𝑁⁄2 − 𝑐𝑓)
𝑀 =𝐿+[ ]𝑋 𝑖
𝑓
(1728−1712)
= 29.5 + [ ] 𝑋 10
598
10×16
= 29.5 + [ ]
598
= 29.5+0.27
= 29.77
26
DEFINITION: MODE:
MODE:
Merits of mode:
Demerits of mode:
Uses of Mode:
INDIVIDUAL SERIES:
Example 1:
10 persons have the following income: Rs. 850, 750, 600, 825, 850, 725, 600, 850, 640, and
530.
27
850 repeat three times,
Therefore the mode salary is 850.
Size: 10 11 12 13 14 15 16 17 18
Frequency: 10 12 15 19 20 8 4 3 2
Analysis Table:
X 1 2 3 4 5 6 TOTAL
10 -
11 1 1
12 1 1 1 3
13 1 1 1 1 1 5
14 1 1 1 1 4
15 1 1
16 -
17 -
18 -
MODE =13
28
Example: Calculate the mode (inclusive method)
Marks 0 – 19 20 – 39 40 – 59 60 – 79 80 – 99
No. of Students 5 20 35 20 12
Solution:
(𝑓1 − 𝑓0 )
∴𝑍 =𝐿+[ ]𝑋 𝑖
2𝑓1 − 𝑓0 − 𝑓2
20 × 15
= 39.5 + [ ]
(15 + 15)
300
= 39.5 + [ ]
30
= 39.5 – 10.0
= 49.5
Grouping Table:
Size of item f (2) (3) (4) (5) (6)
0–5 9
5 – 10 12 21 36
10 -15 15 27 43
15 -20 16 31 48
20 – 25 17 33 48
25 – 30 15 32 42
30 - 35 10 25 38
23
Analysis Table:
29
X 1 2 3 4 5 6 TOTAL
0-5 -
5 – 10 1 1
10 -15 1 1 2
15 -20 1 1 1 1 4
20 – 25 1 1 1 1 1 5
25 – 30 1 1 2
30 - 35 -
35 – 40 -
-
(𝑓1 − 𝑓0 )
∴𝑍 =𝐿+[ ]𝑋 𝑖
2𝑓1 − 𝑓0 − 𝑓2
17 × 16
= 20 + [ ]𝑋 5
(34 − 16 − 15)
272
= 20 + [ ]𝑋 5
3
= 39.5 – 10.0
GEOMETRIC MEAN:
30
Merits of G.M:
Demerits of G.M:
It is difficult to understand.
Non – mathematical persons cannot do calculations.
The G.M. cannot be computed if any item in the series is negative or zero.
It has restricted application.
Uses of G.M:
G.M. is highly useful in averaging ratios, percentages and rate of increase between
two periods.
G.M. is important in the construction of index numbers.
In economic and social sciences, where we want to give more weight to smaller
items and smaller weight to large items, G.M. is appropriate.
50 72 54 82 93
Solution:
X log X
50 1.6990
72 1.8573
54 1.7324
82 1.9138
93 1.9685
∑ 𝐥𝐨𝐠 𝑿 = 𝟗. 𝟏𝟕𝟏𝟎
∑ log 𝑋
G.M. = Antilog ( )
𝑁
9.1710
= Antilog ( )
5
= 68.26
The following table gives the weight of 31 persons in sample survey. Calculate G.M.
31
Weight(Ibs): 130 135 140 145 146 148 149 150 157
No.of Persons: 3 4 6 6 3 5 2 1 1
Solution:
X f log x f log x
130 3 2.1139 6.3417
135 4 2.1303 8.5212
140 6 2.1461 12.8766
145 6 2.1614 12.9684
146 3 2.1644 6.4932
148 5 2.1703 10.8515
149 2 2.1732 4.3464
150 1 2.1959 2.1761
157 1 2.1959 2.1959
∑ 𝒇 = 𝟑𝟏 ∑ 𝒇 𝐥𝐨𝐠 𝒙 = 𝟔𝟔. 𝟕𝟕𝟏𝟎
∑ 𝑓 log 𝑥
G.M. = Antilog ( ∑𝑓
)
66.7710
= Antilog ( )
31
G.M. = 142.5
32
Solution:
X f m log m f log m
7.5 – 10.5 5 9 0.9542 4.7710
10.5 – 13.5 9 12 1.0792 9.7128
13.5 – 16.5 19 15 1.1761 22.3459
16.5 – 19.5 23 18 1.2553 28.8719
19.5 – 22.5 7 21 1.3222 9.2554
22.5 – 25.5 4 24 1.3802 5.5208
25.5 – 28.5 1 27 1.4314 1.4314
∑ 𝒇 = 𝟖𝟎 ∑ 𝒇 𝐥𝐨𝐠 𝒎 = 𝟖𝟏. 𝟗𝟎𝟗𝟐
∑ 𝑓 𝑙𝑜𝑔 𝑚
G.M = Antilog ( ∑𝑓
)
81.9092
= Antilog ( )
16.02
G.M = 16.02
HARMONIC MEAN:
It is rigidly defined.
It is based on all the observations of the series.
It is suitable in case of series having wide dispersion.
It is suitable for further mathematical treatment.
It gives less weight to large items and more weight to small items.
33
Example: Individual series:
The monthly incomes of 10 families in rupees in a certain village are given below:
Family: 1 2 3 4 5 6 7 8 9 10
Income: 85 70 10 75 500 8 42 250 40 36
Solution:
𝟏
Family Income X
𝑿
1 85 0.01176
2 70 0.01426
3 10 0.10000
4 75 0.01333
5 500 0.00200
6 8 0.12500
7 42 0.02318
8 250 0.00400
9 40 0.02500
10 36 0.02778
𝟏
∑ ( ) = 𝟎. 𝟑𝟒𝟔𝟑𝟏
𝑿
𝑁
Harmonic mean = 1
∑( )
𝑋
10
= 0.34631
Harmonic mean = 28.87
Size of items: 6 7 8 9 10 11
Frequency: 4 6 9 5 2 8
Solution:
𝟏 𝟏
x f 𝒇( )
𝒙 𝒙
6 4 0.1667 0.6668
7 6 0.1429 0.8574
8 9 0.1250 1.1250
9 5 0.1111 0.5555
10 2 0.1000 0.2000
11 8 0.0909 0.7272
𝟏
∑ 𝒇 = 𝟑𝟒 ∑ 𝒇 ( ) = 𝟒. 𝟏𝟑𝟏𝟗
𝒙
𝑁 34
Harmonic mean = 1 = =8.23
∑ 𝑓( ) 4.1319
𝑥
34
Example:
𝟏 𝟏
x f m 𝒇( )
𝒎 𝒎
30 – 40 15 35 0.02857 0.42855
40 – 50 13 45 0.02222 0.28886
50 – 60 8 55 0.01818 0.14544
60 – 70 6 65 0.01534 0.09204
70 – 80 15 75 0.01333 0.19995
80 – 90 7 85 0.01176 0.08232
90 -100 6 95 0.01053 0.06318
𝟏
∑ 𝒇=70 ∑ 𝒇 ( ) = 𝟏. 𝟑𝟎𝟎𝟑𝟒
𝒎
𝑁
Harmonic mean = 1
∑ 𝑓( )
𝑚
Weight (W) : 𝑊1 𝑊2 𝑊3 ….
𝑊1 𝑋1 + 𝑊2 𝑋2 + 𝑊3 𝑋3 + ⋯ … ∑ 𝑊𝑋
𝑋̅𝑤 = =
𝑊1 + 𝑊2 + 𝑊3 + ⋯ … . ∑𝑊
35
EXAMPLE: 1
Calculate the simple average and the weighted average of the following data and account for
the difference in the averages.
Items X Weight W WX
68 1 68
85 45 3825
101 31 3131
102 1 102
108 11 1188
110 7 770
112 23 2576
113 17 1921
124 14 1736
128 14 1792
∑ 𝑋 =1051 ∑ 𝑊 =164 ∑ 𝑊𝑋 =17109
∑𝑥 1051
Simple average = = = 105.10
𝑁 10
∑ 𝑊𝑋 17109
Weighted average = ∑𝑊
= = 104.32
164
Let there 𝑁1 items in the first group with mean 𝑋̅1 𝑎𝑛𝑑 𝑁2 items in the second group with
mean 𝑋̅2
When these two groups merge together, there are 𝑁1 + 𝑁2 items whose total= 𝑁1 𝑋̅1 + 𝑁2 𝑋̅2
𝑁1 𝑋̅1 + 𝑁2 𝑋̅2
𝑋̅12 =
𝑁1 + 𝑁2
̅
𝑁 𝑋 +𝑁 𝑋 +𝑁 𝑋 ̅ ̅
𝑋̅123 = 1 1 2 2 3 3
𝑁1 +𝑁2 +𝑁3
36
EXAMPLE: 1
𝑵𝟏 = 100 𝑁2 = 80 𝑋̅1 = 275 𝑋̅2 = 225 find the mean of the salaries of the employees of the
establishment as a whole.
Solution:
𝑁1 𝑋̅1 + 𝑁2 𝑋̅2
𝑋̅12 =
𝑁1 + 𝑁2
100×275+80×225
= 100+80
27500+18000
= 180
45500
= 180
= Rs. 252.78
37
UNIT-II
MEASURES OF DISPERSION
INTRODUCTION:
In a series, all the items are not equal. There is difference or variation among the values.
The degree of variation is evaluated by various measures of dispersion.
Averages are central values. They enable comparison of two or more sets of data. They
are not sufficient to depict the true nature of the sets. For example, consider the following marks
of two students.
Student I Student II
68 85
75 90
65 80
67 25
70 65
Both have got a total of 345 and an average of 69 each. The fact is that the second student has
failed in one paper. When the averages alone are considered, the two students are equal.
DEFINITION:
38
Absolute and Relative Measures:
Dispersion
Absolute Relative
Absolute measure:
It indicates the amount of variation in a set of values.
They are quoted in terms of the units of observations
For eg when rainfall on different days are available in cm ,any absolute measure of
dispersion gives variation in rainfall in cm if it is in mm then the absolute measure of
dispersion are quoted in mm.
Relative measure:
39
METHODS OF MEASURING DISPERSION:
1. Range.
2. Inter – quartile range.
3. Mean – deviation.
4. Standard deviation.
5. Lorenz curve.
Range:
Range is the difference between the greatest and the smallest of the values.
The range is the simplest measure of dispersion.
It is a rough measure of dispersion.
Its measure depends upon the extreme items and not on all the items.
Merits of Range:
Demerits of Range:
It is not reliable.
It is affected by the extreme items.
It is an unsatisfactory measure.
It cannot be applied to open end classes.
It is not suitable for mathematical treatment.
Uses of Range:
Range is used in finding the control limits of Mean chart and Range chart in
S.Q.C.
While quoting the prices of shares, bands, gold, etc. on daily basis or yearly basis,
the minimum and the maximum prices are mentioned.
The minimum and the maximum temperature likely to prevail on each day are
forecasted.
40
INDIVIDUAL SERIES:
Example: Find the range of weights of 7 students from the following 27, 30, 35, 36, 38, 40, 43
Solution:
Range = L – S
= 43 – 27
= 16
𝐿−𝑆
Coefficient of range = 𝐿+𝑆
43−27
=
43+27
= 0.23
DISCRETE SERIES:
Example: Calculate range and its coefficient for the following data.
x: 10 20 30 40 50
f: 2 5 5 7 6
Solution:
Range = L – S
= 50 – 40
= 40
𝐿−𝑆
Coefficient of Range = 𝐿+𝑆
50−10
= 50+10
40
= 60
= 0.666
x: 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60
f: 3 4 2 6 7
Solution:
Range = L – S
= 60 – 10
= 50
41
𝐿−𝑆
Coefficient of Range = 𝐿+𝑆
60−10
= 60+10
50
= 70
= 0.7142
QUARTILE DEVIATION:
To obtain a measure of variation, we use the distance between the first and the
third quartiles.
Quartile deviation is defined as half the distance between the third and first
quartiles symbolically. Hence it is called Semi Inter Quartile Range.
𝑄3 −𝑄1
Semi – inter quartile range = 2
It ignores the first 25% of the items and the last 25% of the items.
It is positional average; hence not amenable to further mathematical treatment.
Its value is affected by sampling fluctuations.
It gives only a rough measure.
It is not the representative value of the data.
Calculate Quartile deviation and its coefficient for the following data.
Solution:
42
𝑁+1 𝑡ℎ
𝑄1 = Value of ( ) item
4
10+1 𝑡ℎ
= Value of ( ) item
4
= 3 + 0.75(5 - 3)
Q1=4.5
𝑁+1 𝑡ℎ
𝑄3 = Value of 3 ( ) item
4
10+1 𝑡ℎ
= Value of 3 ( ) item
4
= 12 + 0.25(15 - 12)
= 12 + 0.7
Q3=12.75
𝑄3 −𝑄1
Quartile deviation = 2
12.75−4.5
= = 4.125
2
𝑄 −𝑄
Coefficient of Quartile deviation = 𝑄3 +𝑄1
3 1
8.25
= 17.25
= 0.4782.
x: 20 21 22 23 24 25 26 27 28
f: 8 10 11 16 20 25 5 9 6
Solution:
x: 20 21 22 23 24 25 26 27 28
43
f: 8 10 11 16 20 25 5 9 6
Cf: 8 18 29 45 65 90 105 114 120
𝑁+1 𝑡ℎ
𝑄1 = Value of ( ) item
4
𝑡ℎ
120+1
= Value of ( ) item
4
= 23 + 0.25(23 – 23)
Q1= 23
𝑁+1 𝑡ℎ
𝑄3 = Value of 3 ( ) item
4
10+1 𝑡ℎ
= Value of 3 ( ) item
4
= 25 + 0.75(26 - 25)
= 25 + 0.75
Q3 =25.75
𝑄3 −𝑄1
Quartile deviation = 2
25.75−23
= 2
= 1.375
𝑄3 −𝑄1
Coefficient of Quartile deviation =
𝑄3 +𝑄1
25.75−23
= 25.75+23
= 0.0564
44
Example: Continuous series:
Calculate the semi – inter quartile range of wages and coefficient of Q.D.
Wages(Rs) Labourers
30 – 32 12
32 – 34 18
34 – 36 16
36 – 38 14
38 – 40 12
40 – 42 8
42 – 44 6
Solution:
x f C.F
30 – 32 12 12
32 – 34 18 30
34 – 36 16 46
36 – 38 14 60
38 – 40 12 72
40 – 42 8 80
42 – 44 6 86
𝑁 𝑡ℎ
𝑄1 = Size of ( 4 ) item
86 𝑡ℎ
= Size of ( 4 ) item
= 21.5
𝑄1 lies between 32 – 34
𝑁⁄ − 𝑐𝑓
4
𝑄1 = 𝐿 + ×𝑖
𝑓
21.5−12
= 32 + ×2
18
= 32 + 1.06 = 33.06
𝑁 𝑡ℎ
𝑄3 = Size of 3 ( 4 ) item
86 𝑡ℎ
= Size of 3 ( 4 ) item
= 3(21.5)th item
𝑄3 lies between 38 – 40
3(𝑁⁄4)− 𝑐𝑓
𝑄1 = 𝐿 + ×𝑖
𝑓
45
64.5−60
= 38 + ×2
12
= 32 + 0.75 = 38.75
𝑄3 −𝑄1
Quartile deviation = 2
38.75−33.06
= 2
= 2.85
𝑄 −𝑄
Coefficient of Quartile deviation = 𝑄3 +𝑄1
3 1
38.75−33.06
=
38.75+33.06
= 0.08
MEAN DEVIATION:
M.D. is the arithmetic mean of the absolute deviations of the values about their
arithmetic mean or median or mode.
M.D. is the abbreviation of Mean Deviation. There are three kinds of mean
deviations, viz.,
❖ Mean deviation or mean deviation about mean.
❖ Mean deviation about median.
❖ Mean deviation about mode.
Merits of M.D:
Demerits of M.D:
46
Mean deviation about mean (individual series):
Calculate mean deviation from mean for the following data: 100, 150, 200, 250, 360, 490,
500, 600, 671.
Solution:
X |𝑿 − 𝑿 ̅|
100 269
150 219
200 169
250 119
360 9
490 121
500 131
600 231
671 302
∑ 𝑿 = 𝟑𝟑𝟐𝟏 ∑|𝑿 − 𝑿 ̅ | = 1570
∑𝑋 3321
Mean 𝑋̅ = = = 369
𝑁 9
̅|
∑|𝑿− 𝑿
Mean deviation from mean = 𝑁
1570
= = 174.44
9
174.44
= = 0.47
369
Discrete Series:
x: 2 4 6 8 10
f: 1 4 6 4 1
Solution:
x f Fx ̅|
|𝑿 − 𝑿 𝒇|𝑿 − 𝑿̅|
2 1 2 4 4
4 4 16 2 8
6 6 36 0 0
8 4 32 8 8
10 1 10 4 4
∑ 𝒇 = 𝟏𝟔 ∑ 𝒇𝒙 = 𝟗𝟔 ∑ 𝒇|𝑿 − 𝑿̅ | = 𝟐𝟒
∑ 𝑓𝑥 96
𝑋̅ = = =6
𝑁 16
24
M.D = 16 =1.5
47
1.5
Coefficient of M.D = 6
= 0.25
Continuous Series:
C.I: 2 – 4 4 – 6 6 – 8 8 – 10
F: 3 4 2 1
Solution:
x f m fm |𝒎 − 𝑿̅| 𝒇 |𝒎 − 𝑿 ̅|
2–4 3 3 9 2.2 6.6
4–6 4 5 20 0.2 0.8
6–8 2 7 14 1.8 3.6
8 – 10 1 9 9 3.8 3.8
∑ 𝑓 = 10 ∑ 𝑓𝑚 = 52 ∑ 𝒇|𝒎 − 𝑿 ̅ | = 𝟏𝟒. 𝟖
∑ 𝑓𝑚 52
𝑋̅ = = = 5.2
𝑁 10
̅|
∑ 𝒇|𝒎− 𝑿
M.D from mean = ∑𝑓
14.8
= = 1.48
10
1.48
= 5.2
= 0.28
48
Solution:
X |𝑿 − 𝑴|
15 22.5
25 12.5
30 7.5
35 2.5
40 2.5
45 7.5
50 2.5
50 2.5
𝑛+1 𝑡ℎ
Median = ( ) 𝑖𝑡𝑒𝑚
2
= 4.5𝑡ℎ 𝑖𝑡𝑒𝑚
35+40
= = 37.5
2
∑|𝑋−𝑀| 80
M.D from median = = = 10
𝑁 8
10
Coefficient of M.D from median = 37.5 = 0.2666
Discrete Series:
x: 10 12 13 14 15 16
f: 2 3 7 20 8 9
Solution:
x F Cf |𝑿 − 𝑴| 𝒇|𝑿 − 𝑴|
10 2 2 4 8
12 3 5 2 6
13 7 12 1 7
14 20 32 0 0
15 8 40 1 8
16 9 49 2 18
∑ 𝒇 = 𝟒𝟗 ∑ 𝒇 |𝑿 − 𝑴| = 𝟒𝟕
49+1 𝑡ℎ
Median = ( ) 𝑖𝑡𝑒𝑚
2
50 𝑡ℎ
= ( ) 𝑖𝑡𝑒𝑚
2
Median = 14
49
∑ 𝑓|𝑋−𝑀| 47
M.D from median = ∑𝑓
= = 0.95
49
0.95
Coefficient of M.D from median = = 0.067
14
Continuous series:
C.I 16 – 20 21 – 25 26 – 30 31 – 35 36 – 40 41 – 45 46 – 50 51 – 55 56 – 60
F: 8 15 13 20 11 7 3 2 1
Solution:
𝑁 𝑡ℎ
Median = S.of ( 2 ) item
80 𝑡ℎ
= S.of ( 2 ) item
= 40
40−36
= 30.5 + [ ] ×5
20
∑ 𝒇|𝒎−𝑴|
M.D from median = 𝑁
582
= = 7.28
80
50
7.28
= 31.50 = 0.231
32 51 23 46 20 78 57 56 57 30
Solution:
X |𝑿 − 𝒛|
32 37
51 34
23 27
46 25
20 11
78 6
57 1
56 0
57 0
30 21
∑|𝑿 − 𝒛| = 𝟏𝟔𝟐
Mode = 57
∑|𝑿−𝒛| 162
M.D from mode = = = 16.2
𝑁 10
16.2
Coefficient of M.D from mode = = 0.28
57
Discrete series:
x: 21 25 27 32 41 46 50 55
f: 2 3 10 20 15 10 8 2
Solution:
x F |𝑿 − 𝒛| 𝒇|𝑿 − 𝒛|
21 2 11 22
25 3 7 21
27 10 5 50
32 20 0 0
41 15 9 135
46 10 14 140
50 8 18 144
51
55 2 23 46
∑ 𝒇 = 𝟕𝟎 ∑ 𝒇|𝑿 − 𝒛| = 𝟓𝟓𝟖
Z = 32
∑ 𝑓|𝑋−𝑧| 558
M.D from mode = =
𝑁 10
= 7.97
7.97
Coefficient of M.D from mode = = 0.2491
32
Continuous series:
x: 0 − 10 10 − 20 20 − 30 30 − 40 40 − 50
f: 4 15 28 16 7
Solution:
|𝒎 − 𝒛|
x f M 𝒇|𝒎 − 𝒛|
z = 28
0 – 10 4 5 23 92
10 – 20 15 15 13 195
20 – 30 28 25 0 0
30 – 40 16 35 12 192
40 – 50 7 45 21 147
∑ 𝒇 = 𝟕𝟎 ∑ 𝒇|𝒎 − 𝒛|
Z = 28
∑ 𝒇|𝒎−𝒛| 626
M.D about mode = ∑𝑓
= = 22.36
28
S.D is also called Root mean square deviation or Mean Error or Mean square
Error.
The reason is that it is the square root of the means of the squared deviation from
the arithmetic mean.
It provides accurate result.
Merits of S.D:
It is rigidly defined and its value is always definite and based on all the
observations and the actual signs of deviations are used.
As it based on A.M. it has all the merits of A.M.
It is the most important and widely used measures of dispersion.
It is possible for further algebraic treatment.
52
It is less affected by the fluctuations of sampling and hence stable.
It is the basis for measuring the coefficient of correlation, sampling and statistical
inferences.
Demerits of S.D:
20143 385 2
𝜎 = √ 10 − ( 10 )
𝜎 =√2014.3 − 1482.25
𝜎 =√532.05
𝜎 = 23.07
53
Individual series:
Solution:
X X2
14 196
22 484
9 81
15 225
20 400
17 289
12 144
11 121
∑ 𝑿 = 𝟏𝟐𝟎 ∑ 𝑿𝟐 = 𝟏𝟗𝟒𝟎
2
∑ 𝑋2 ∑𝑋
𝜎= √ − ( )
𝑁 𝑁
1940 120 2
𝜎= √ −( )
8 8
𝜎 = 4.18
Discrete series:
Marks: 10 20 30 40 50 60
No. of Students: 8 12 20 10 7 3
Solution:
x F Fx f x2
10 8 80 800
20 12 240 4800
30 20 600 18000
40 10 400 16000
50 7 350 17500
60 3 180 10800
∑ 𝒇 = 𝟔𝟎 ∑ 𝒇𝒙 = 𝟏𝟖𝟓𝟎 ∑ 𝒇𝒙𝟐 = 𝟔𝟕𝟗𝟎𝟎
54
2
∑ 𝑓𝑋2 ∑ 𝑓𝑋
𝜎= √ − ( )
∑𝑓 ∑𝑓
67900 1850 2
= √ − ( )
60 60
= √1131.67 − (30.83)2
= √. 671131 − (30.83)2
𝜎 = 13.484
Continuous series:
Class(x): 0 – 10 10 – 20 20 – 30 30 – 40 40 – 50 50 – 60 60 – 70
Frequency: 8 12 17 14 9 7 4
Solution:
x f m m2 fm fm2
0 – 10 8 5 25 40 200
10 – 20 12 15 225 180 3375
20 – 30 17 25 625 425 10625
30 – 40 14 35 1225 490 17150
40 – 50 9 45 2025 405 18225
50 – 60 7 55 3025 385 21175
60 – 70 4 65 4225 260 16900
∑ 𝒇 = 𝟕𝟏 ∑ 𝒇𝒎 = 𝟐𝟏𝟖𝟓 ∑ 𝒇𝒎𝟐 = 87650
2
∑ 𝑓𝑚2 ∑ 𝑓𝑚
𝜎= √ − ( ) ×𝑖
𝑁 𝑁
87650 2185 2
𝜎 = √ 71 − ( 71 ) × 1
= 16.96
55
DEFINITION: SKEWNESS:
SKEWNESS:
Dispersion Skewness
It shows us the spread of individual It shows us departure from symmetry.
values about the central value.
It is useful to study the variability It is useful to study the concentration in
in data. lower or higher variables.
It judges the truthfulness of the It judges the differences between the central
central tendency. tendencies.
It is a type of average of deviation- It is not an average, but is measured by the
average of the second order. use of the mean, the median and mode.
It shows the degree of variability. It shows whether the concentration is in
higher or lower values.
MEASURES OF SKEWNESS:
OBJECTIVE OF SKEWNESS:
Measures of skewness tell us the direction and extent of asymmetry in a series, and permit
us to compare two or more series with regard to these.
56
Measures of skewness give an idea about the nature of variation of the items about the
central value.
X − Mode
Coefficient of Skewness (SKp) =
σ
In case the mode is ill-defined, the coefficient can be determined by the changed formula:
25 15 23 40 27 25 23 25
57
Solution:
Marks
S. No X 𝑿𝟐
1 25 625
2 15 225
3 23 529
4 40 1600
5 27 729
6 25 625
7 23 529
8 25 625
9 20 400
2
Total ∑ 𝑋 = 223 ∑ 𝑋 = 5887
∑𝑋 2
∑𝑋 2
Standard Deviation: Formula, 𝜎 = √ 𝑁 − (𝑁)
5887 223 2
= √ 9 − ( 9 )
=√654.11 − (24.78)2
= √654.11 − 614.0484
= √40.0616
𝜎 = 6.33
∑𝑋
𝑋̅ = 𝑁
223
= 9
X = 24.78
Mode Z = 25
Mean − Mode
Karl Pearson’s coefficient of skewness =
S.D
24.78 − 25
=
6.33
−0.22
=
6.33
58
Discrete Series:
Size 3 4 5 6 7 8 9 10
frequency 7 10 14 35 102 136 43 8
Solution:-
X f 𝑿𝟐 fx fx2
3 7 9 21 63
4 10 16 40 160
5 14 25 70 350
6 35 36 210 1260
7 102 49 714 4998
8 136 64 1088 8704
9 43 81 387 3483
10 8 100 80 800
2
∑ 𝑓𝑋 = 2610 ∑ 𝑓𝑋 = 19818
Total N= 355
∑ 𝑓𝑋 ∑ 𝑓𝑋 2 2
Standard Deviation, 𝜎 = √ 𝑁 − ( 𝑁 )
19818 2610 2
=√ − ( 355 )
355
= √55.82 − (7.35)2
= √55.82 − 54.05
= √1.77
= 1.33
∑ 𝑓𝑋
𝑚𝑒𝑎𝑛 𝑋̅ = 𝑁
2610
= 355
Mean = 7.33
Mode = 8
59
Karl Pearson’s coefficient of skewness:
Mean − Mode
=
S.D
7.35−8
=
1.33
=-0.49
Continuous Series:
Find the standard deviation and coefficient of skewness for the given distribution:
∑ 𝑓𝑚 ∑ 𝑓𝑚 2 2
Standard Deviation, 𝜎 = √ 𝑁 − ( 𝑁 )
40768.75 1642.5 2
=√ − ( )
75 75
= √543.58 − (21.9)2
= √543.58 − 479.61
= √63.97
= 7.998(or) 8
60
∑ 𝑓𝑚
Mean 𝑋̅ = 𝑁
1642.5
= 75
= 21.9
f1 − f 0
Z = L1 + i
2 f1 − f 0 − f 2
21 − 13
= 20 + 5
2(21) − 13 − 16
40
= 20 +
13
= 23.08
Mean − Mode
=
S.D
21.9−23.08
= 8
= -0.148
EXAMPLE:
From a moderately skewed distribution of retail prices for men’s shoes, it is found that the
mean price is RS. 20 and the median price is Rs. 17. If the coefficient of variation is 20% , find
the pearson’s coefficient of skewness of the distribution.
Solution:
3(Mean−Median)
SK= Standard Deviation
Here, mean = 20 and median = 17 are given in the problem. To find the coefficient of skewness
we need standard deviation.
Standard Deviation ×100
C.V. = Mean
𝜎
20 = 20 × 100
5𝜎 = 20
𝜎 =4
61
3(20−17)
SK = 4
3(3)
= 4
= 2.25
= Q3 + Q1 – 2Median
Q 3 + Q1 − 2Median
Coefficient of SK =
Q 3 − Q1
From the information given below calculate quartile or Bowley’s coefficient of skewness:
Measure place
Median 201.0
S.D. 215.4
Third quartile 260.0
First quartile 157.0
Solution:
Q3 + Q1 − 2 Median
SK =
Q3 − Q1
417 − 402
=
103
15
=
103
=0.146
Continuous Series:
62
Payment of commission RS No. of. salesmen
1000-1200 4
1200-1400 10
1400-1600 16
1600-1800 29
1800-2000 52
2000-2200 80
2200-2400 32
2400-2600 23
2600-2800 17
2800-3000 7
Solution:
N
Median M = size of 2 𝑡ℎ item
270
= = 135th item which lies in the class 2000-2200
2
𝑁
−𝐶𝐹
Median = L1 +[ 2 ]×𝑖
𝑓
135−111
= 2000+[ ] × 200
80
= 2000 + 60
= 2060
N
𝑄1 = size of 4 𝑡ℎ item
270
= = 67.5th item which lies in the class 1800-2000
4
𝑁
−𝑐.𝑓
𝑄1 = L1 +[ 4 ]×𝑖
𝑓
63
67.5−59
= 1800+[ ] × 200
52
= 1800+ 32.69
𝑄1 = 1832.69
3N
𝑄3 = size of 𝑡ℎ item
4
3×270
= = 202.5 th item which lies in the class 2200-2400
4
3𝑁
−𝑐.𝑓
4
𝑄3 = L1 +[ ]×𝑖
𝑓
3×67.5−191
= 1800+[ ] × 200
32
= 2200+ 71.88
𝑄1 = 2271.88
Q3 + Q1 − 2 Median
SK =
Q3 − Q1
2271.88+1832.69−2×2060
= 2271.88−1832.69
4104.57−4120
= 439.19
= -0.035
64
UNIT-IV
CORRELATION
INTRODUCTION :
DEFINITION:
TYPES OF CORRELATION:
Positive correlation:
If two variables tend to move together in same direction (i.e) an increase in the value of
one variable is accompanied by an increase in the value of the other variable or a decrease
in the value of one variable is accompanied by a decrease in the value of the other
variable then the correlation is called positive.
Ex: Height and weight, price and supply. weight, rainfall and yield of crops, price and
supply .
X 10 20 30 40 50
Y 50 60 70 80 90
Negative correlation:
x 10 20 30 40 50
y 50 40 25 15 10
65
ii) Simple and multiple correlation :
Simple correlation:
When we study only two variables, the relationship is described as simple correlation.
Eg: quantity of money price level, demand & Price.
Multiple correlations:
partial correlation:
The study of two variables excluding some other variables is called Partial correlation.
Eg: we study price and demand , eliminating the supply side.
Total correlation:
If the ratio of change between two variables is uniform, there will be linear correlation.
X 5 10 15 20
Y 4 8 12 16
If we plot the values on the graph, if its forms a straight line. Then such a correlation is
called linear correlation
In a curvilinear or non-linear correlation, the amount of change in one variable does not
bear a constant ratio of the amount of change in the other variables.
If we plot the values on the graph, if it forms a curve or scattered around the curve then it
is called curvilinear..
v) No Correlation:
When the points are scattered there is no correlation between the two variables.
Methods of Correlation:
1. Graphic Method:
• Scatter Diagram
• Simple Graph
66
2. Mathematical Method:
• Karl Pearson’s Coefficient of Correlation
• Spearman’s Rank coefficient of Correlation
• Coefficient of Concurrent Deviation.
• Method of least squares.
i) Scatter diagram :
This is the simple method of finding out whether there is any relationship between two
variables by plotting the values on a chart, known as Scatter diagram.
If the plotted points form a straight line running from the lower left-hand corner to the
upper right-hand corner, then there is a perfect positive correlation i.e., r = +1
On the other hand , if the points are in a straight line, running from the upper left-hand
corner to the lower right – hand corner, it reveals that there is a perfect negative or
inverse correlation i.e., r = - 1,
Merits:
67
Demerits:
Example 1
Calculate karl Pearson’s Correlation Coefficient from the following data.
X 40 45 50 53 60 57 51 48 45 47
Y 75 69 64 70 71 75 83 90 92 65
Solution:
X Y X2 Y2 XY
40 75 1600 5625 3000
45 69 2025 4761 3105
50 64 2500 4096 3200
53 70 2809 4900 3710
60 71 3600 5041 4260
57 75 3249 5625 4275
51 83 2601 6889 4233
48 90 2304 8100 4320
45 92 2025 8464 4140
47 65 2209 4225 3055
496 754 24922 57726 37298
𝑁∑𝑋𝑌−∑𝑋∑𝑌
r=
√𝑁∑𝑋 2 −∑(𝑋)2 √𝑁∑𝑌 2 –(∑𝑌)2
10𝑋37298−496𝑋754
=
√10(24922)−(496)2 √10(57726) –(754)2
68
372980−373984
=
√249220−246016√577260–568516
−1004
=
√3204√8744
−1004
= 56.6039𝑋93.5094
−1004
= 5292.9967
= - 0.1897
Example 2:
Calculate Coefficient of Correlation from the following data.
X 12 9 8 10 11 13 7
Y 14 8 6 9 11 12 3
Solution:
X Y X2 Y2 XY
12 14 144 196 168
9 8 81 64 72
8 6 64 36 48
10 9 100 81 90
11 11 121 121 121
13 12 169 144 156
7 3 49 9 21
70 63 728 651 676
𝑁∑𝑋𝑌−∑𝑋∑𝑌
r=
√𝑁∑𝑋 2 −∑(𝑋)2 √𝑁∑𝑌 2 –(∑𝑌)2
7𝑋676−70𝑋63
=
√7(728)−(70)2 √7(651)–(63)2
4732−4410
=
√5096−(4900)√4557)–(3969)
69
322
=
√196𝑋588
322
= 339.48
r = 0.95
When the actual rank are given, the steps followed are:
1. Compute the difference of the two ranks (R1 and R2) and denote by d.
2. Square the d and get ∑d2.
3. Substitute the figures in the formula.
When no rank is given, but actual data are given, then we must given ranks. We can give
ranks by taking the highest as 1 or the lowest as 1, net to the highest (lowest) as 2 and follow the
same procedure for both the variables,
When two or more items have equal values. It is difficult to give ranks to them. In that case
the items are given the average of the ranks they would have received, if they are not tied. For
example, if two individuals are placed in the seventh place, they are each given the rank 7+8/2 =
7.5 which is common rank to be assigned; and the next will be 9; and if there are ranked equal at
the seventh place. They are given the rank 7+8+9/3 =8 which is the common rank to be
assigned to each; and the next rank will be 10, in this case. A slightly different formula is used
when there is more than one item having the same value. The formula is
1 1
∑𝑑2 + (𝑛3 −𝑚)+ +⋯
12 12(𝑚3 − 𝑚)
ρ = 1- 6 { }
𝑁3 − 𝑁
Merits:
70
Spearman’s rank correlation coefficient is useful in qualitative analysis. For example it is
sufficient for the judges to ranks the competitors. Judges to ranks the competitors.
Judges need not assign scores. It is more difficult to assign scores to the competitors than
ranking them.
It is the only method when ranks are given.
It can also calculate when the values of the variables are given.
It is simple to understand.
It is generally easy to calculate.
Demerits:
Problem - 1
RX RY D D2
1 2 -1 1
2 4 -2 4
3 1 2 4
4 5 -1 1
5 3 2 4
6 9 -3 9
7 7 0 0
8 10 -2 4
9 6 3 9
10 8 2 4
=40
∑d2
ρ = 1- 6 N(N2 − 1)
6X40
= 1- 6 10(102 −1)
240
= 1- 10X99
71
240
= 1 - 990
= 1 – 0.24
ρ = 0.76
Example 2
Calculate the rank Correlation coefficient for the following data:
X 65 68 67 69 66 70 71 75
Y 65 68 67 64 72 70 86 73
Solution:
X Y RX RY d = RX - RY d2
65 65 8 7 1 1
68 68 5 5 0 0
67 67 6 6 0 0
69 64 4 8 -4 16
66 72 7 3 4 16
70 70 3 4 -1 1
71 86 2 1 1 1
75 73 1 2 -1 1
∑d2
ρ = 1- 6 N(N2 − 1)
6X36
= 1- 6 8(82 −1)
240
= 1- 10X99
216
= 1-
504
= 1 – 0.4286
ρ = 0.5714
72
Marks in Economics (x) 50 60 65 70 75 40 70 80
Marks in Statistics (y) 80 71 60 75 90 82 70 50
Solution :
X Y RX RY d = R X - RY d2
50 80 7 3 4 16
60 71 6 5 1 1
65 60 5 7 -2 4
70 75 3.5 4 -0.5 0.25
75 90 2 1 1 1
40 82 8 2 6 36
70 70 3.5 6 -2.5 6.25
80 50 1 8 -7 49
∑d2= 113.5
N =8
To is repeated 2 times, m=2
6{113.5+ 0.5}
= 1–[ ]
8𝑋63
6𝑋114
= 1–[ ]
504
684
= 1 – [504]
= 1 – 1.3571
= - 0.3571
ρ = - 0.3571
Marks obtained by 8 students in Maths and statistics are given below. Compute the rank
Correlation.
73
Marks in Statistics (y) 40 30 55 60 40 30 60 70
Solution:
X Y RX RY d = R X - RY d2
15 40 7 5.5 1.5 2.25
20 30 5.5 7.5 -2 4
28 55 4 4 0 0
12 60 8 2.5 5.5 30.25
40 40 3 5.5 -2.5 6.25
50 30 2 7.5 -5.5 30.25
20 60 5.5 2.5 3 9
80 70 1 1 0 0
82
6{82+0.5+0.5+0.5+0.5}
= 1–[ ]
8𝑋63
6𝑋84
= 1- 504
504
= 1 - 504
= 1-1
= 0
ρ=0
Problem:
Ten Competitors in a beauty contest are ranked by three judges in the following order:
I Judge 1 5 4 8 9 6 10 7 3 2
II Judge 4 8 7 6 5 9 10 3 2 1
74
III Judge 6 7 8 1 5 10 9 2 3 4
Use rank correlation coefficient to discuss which pair of judges have the nearest approach to
common tests in beauty
Solution:
6X74
= 1- 10(102 − 1)
444
= 1-
990
= 1- 0.448
ρ12 = 0.552
∑d2
ρ23 = 1- 6 N(N2 − 1)
6X44
= 1- 10(102 − 1)
75
264
= 1- 990
= 1- 0.267
ρ23 = 0.733
1st &3rd Judge
∑d2
ρ13 = 1- 6 N(N2 − 1)
6X156
= 1- 10(102 − 1)
936
= 1- 990
= 1 – 0.945
ρ13 = 0.055
The Second and third judges have the nearest approach in common tastes in beauty, because the
coefficient of correlation is highest between them.
In this method, only the direction to change in the variables x and y is taken into account. It
is the simplest method of finding out correlation. This is based on the signs of deviations; for
each term the change is the value of the variable from its preceding or previous value which
may be plus (+) or minus (-). The formula is :
2𝐶−𝑁
𝑟(𝑐) = ± √ 𝑁
Where,
Steps:
1. Find out the directions of change of x variable. Take the first value of X as base and
note down whether the second value is increasing or decreasing or constant. If it
increases in relation to the previous one, mark plus (+) sign against it; if it decrease, put
76
minus(-) sign; and if it us equal, put zero. In the case of the third value the second value
is the base and repeat the above method till the item. The heading of the column is
denoted by Dx.
2. Find out the direction of change of y variable, following the above step. The heading of
the column is denoted by Dy.
3. Multiply Dx by Dy and find out the values of C; i.e., the number of positive items.
4. Substitute the figures in the formula
2𝐶−𝑁
If √ is negative, the negative value multiplied by the minus sign inside will make
𝑁
it positive and we can take the square root. But if the ultimate result is negative, we
cannot take the square root of minus sign.
2𝐶−𝑁
If √ is positive, then all the sign will be positive.
𝑁
Merits:
Demerits:
Regression
Introduction:
77
In regression analysis independent variable is also known as regression or predicted or
explanatory variable. While the dependent variable s also known as regressed or
explained variable
The two regression equations are generally different and are not be interchanged in their
usage.
̅, ̅
The two regression lines intersect at (X Y).
Correlation coefficient is the geometric mean of the two regression coefficients.
That is, correlation coefficient is the square root of the product of the two regression
coefficients.
r = ± √bYX . bXY
The two regression coefficients and the correlation coefficient have the same sign.
Both the regression coefficient cannot be greater than 1 numerically simultaneously:
Regression coefficient are independent of change of origin but are affected by change of
scale.
Each regression coefficient indicates is in the unit of the measurement of the dependent
variable.
Each regression coefficient indicates the quantum of change in the dependent variable
corresponding to unit increase in the independent variable.
Uses of regression:
It is widely used method than correlation analysis.
It is used to estimate the relationship between two Economic variable income and
Expenditure.
Predicts the value of dependent from the independent values.
We can calculate coefficient of correlation(r) and Coefficient of Determination𝑟 2 .
Estimation of Demand curves, Supply, Production.
Correlation Regression
78
1. Correlation is the relationship between 1. Regression means going back. The average
two or more variables. It is expressed relation between the variables is given as an
numerically. equation.
79
Regression Line:
Linear Regression attempts to model the relationship between two variables by fitting a
linear equation to observed data. A linear regression line has an equation of the form Y= a+bX,
where X is the explanatory variable and the Y is the dependent variables.
Regression line Y on X :
This gives the most probable values of y from the given value of x.
Y= a+bX
Normal equations
∑Y = na+b∑x (1)
∑X = na+b∑Y (1)
• Regression coefficient of X on Y :
80
𝜎𝑋
bXY = r. 𝜎𝑌
∑𝑋𝑌 𝑁∑𝑋𝑌− ∑𝑋∑𝑌
bXY= (or) bXY = 𝑁∑𝑌 2 − (∑𝑌)2
∑𝑌 2
• Regression Coefficient of Y on X:
𝜎𝑌 𝑁∑𝑋𝑌− ∑𝑋∑𝑌
bYX = r. (or) bYX = 𝑁∑𝑋 2 − (∑𝑋)2
𝜎𝑋
Regression equation of X on Y:
(X – X) = bXY (Y-Y)
Regression equation of Y on X
(Y-Y) = bYX (X-X)
Method 1: Problem 1:
From the following data obtained the two regression equations by the method of least square.
X 6 2 10 4 8
Y 9 11 5 8 9
Solution:
X Y X2 Y2 XY
6 9 36 81 54
2 11 4 121 22
10 5 100 25 50
4 8 16 64 32
8 9 64 49 56
220 340 214
Here, n=5
81
X = a +by (1)
Normal equation
5a+40b = 30 (4)
40a+ 340b=214 (5)
b = -1.3
b = - 1.3 sub in equ (4)
5a+40(-1.3) = 30
5a = 30 +52
a =16
a = 16 & b = - 1.3 sub in equ (1)
X = a+bY
X= 16 – 1.3Y
Y = a +bX (1)
Normal equation
82
∑XY = a∑X+b∑X2 (3)
5a+40b = 30 (4)
30a+ 220b=214 (5)
5a + 30 (- 0.65) = 40
5a = 40 + 19.5
a = 11.9
Y = a + bX
Y = 11.9 – 0.65X
Method :2
Problem:
Calculate the two regression equation from the following data .
X 10 12 13 12 16 15
Y 40 38 43 45 37 43
Solution:
83
X Y X2 Y2 XY
10 40 100 1600 400
12 38 144 1444 456
13 43 169 1849 559
12 45 144 2025 540
16 37 256 1369 592
15 43 225 1849 645
=78 =246 =1038 =10136 = 3192
X
X =
n
78
= 6
= 13
X = 13
∑Y
Y= 𝑛
246
= 41
Y = 41
𝑁∑𝑋𝑌− ∑𝑋∑𝑌
bXY =
𝑁∑𝑌 2 − (∑𝑌)2
6𝑋3192−78𝑋246
= 6𝑋10136 – (246)2
19152−19188
= 601816 – 60516
−36
= 300
bXY = - 0.12
𝑁∑𝑋𝑌− ∑𝑋∑𝑌
bYX =
𝑁∑𝑋 2 − (∑𝑋)2
84
6𝑋3192−78𝑋246
= 6𝑋138− (78)2
19152−19188
= 6228−6084
−36
= 144
bYX = - 0.25
Regression equation of X on Y
X − X = b XY (Y − Y )
= - 0.12 Y + 4.92
X = - 0.12Y + 4.92
Regression equation Y on X
Y − Y = bYX ( X − X )
Y − 41 = −0.25( X − 13)
= - 0.25X + 3.25
Y = - 0.25X + 3.25
Y = - 0.25(20) + 44.25
= - 5 + 44.25
Y = 39.25
Method 2:
Case(ii)
85
Problem:
From the following information on values of two variables X and Y. Find the two regression
equation and correlation coefficient. N=10, ∑x = 20, ∑y = 40, ∑x2 = 240, ∑y2 = 410, ∑xy = 200
Solution:
ΣX
X=
n
20
= =2
10
Y
Y=
n
40
= =4
10
Y=4
10 X 200 − 20 X 40
b XY =
10 X 410 − (40) 2
2000 − 800
=
4100 − 1600
= 0.48
10 X 200 − 20 X 40
bYX =
10 X 240 − (20) 2
2000 − 800
=
2400 − 400
= 0.6
bYX = 0.6
Regression equation of X on Y:
X −X = b XY (Y − Y )
X − 2 = 0.48 (Y − 4)
X = 0.48 + 0.08
Regression equation of Y on X
86
Y − Y = bYX ( X − X )
Y − 4 = 0.6 ( X −1.2)
Y = 0.6 X 2.8
Correlation Coefficient :
r = b XY bYX
= 0.48 X 0.6
= 0.288
r = 0.5367
Method :2
Case (iii)
Problem :
Find the following regression equations find the mean values of X and Y series:
8X – 10Y = - 66
Solution:
-32Y = -544
Y = 17
87
8X – (10x17) = - 66
8X = - 66 + 170
X = 104/8
X = 13
Hence the mean value of X = 13, Y=17
UNIT-IV
88
INDEX NUMBER
DEFINITION:
Index numbers are a special type of averages. For example, let the
commodities be rice, kerosene and cloth. The price of rice per kilogram is
considered; the price of kerosene per liter and the price of a cloth per metre are
considered. The average change in prices is indicated by the index number.
89
100
Purchasing power of one rupee = Price index
The following aspects are to be carefully considered during the construction of an index
number,
1. The purpose:
The purpose of the index number is to be clearly known for whom it is meant, by
whom it is to be used etc. to be spelt out.
(ii). It should not be two distant in the past this is to keep the Index numbers
useful.
3. The items:
The items including all the items in a study is neither feasible nor useful. Only
those items which concern the people for whom the index number is intended are to be
included. For considering the living conditions of people in hill stations woolen clothes
should be included
5. The Average
90
The Average for arriving at the average value of a group of items, the suitable
average is to be decided. In other contexts A . M may be more useful. It may be simple
to understand and easy to calculate.
6 . Weighting :
By unweighted method, equal weight age of unity is given to all the items.
7 . The formula:
Period is referred to as year here after and the following notations are used.
P – Price of a commodity.
q – quantity of a commodity.
V or W – weight of a commodity.
P q
P=P1 × 100, Q = q1 × 100.
0 0
P01 – price index numbers the current year compared with the base year.
91
Q01 – quantity index number of the current year compared with the base year.
Formulae:
Methods
The price relatives P, for price index number and the quantity relatives, Q
For quantity index number are calculated and their A.M or G.M is found.
92
Problem: 1 from the following data constructs an index for 1995 taking 1994 as base:
Commodities A B C D E
Price
Commodities P Log P
1994(p0 ) 1995(p1 ) P = P1 x 100
0
A 50 70 140.00 2.1461
B 40 60 150.00 2.1761
C 80 90 112.50 2.0512
D 110 120 109.09 2.0378
E 20 20 100.00 2.0000
Total ∑ p0= 300 ∑ p1 = 360 ∑ P = 611.59 ∑ log p =
10.4112
By Aggregative Method,
∑P
P01 = ∑ P1 X 100
0
360
= 300 x 100 = 120
∑P
Using A.M., P01 = N
611.59
= = 122. 32
5
∑ log p
Using G.M., P01 = Antilog ( )
N
10.4112
= Antilog ( ) = 120.84
5
∑ P1 q1
(ii) Paasche’s formula: P01 P = ∑ x 100
P0 q1
∑P q ∑P q
(iii) Fisher’s formula: P01 F = √∑ 1 0 x ∑ 1 1 x 100
P q P q 0 0 0 1
93
∑ P1 (q0 +q1 )
(iv) Marshall- Edge worth formula: P01 ME = ∑ x 100
P0 (q0 +q1 )
∑ P1 q0 +∑ P1 q1
=∑ x 100
P0 q0 +∑ P0 q1
1 ∑ P1 q0 +∑ P1 q1
(v) Bowley’s formula: P01 B = (∑ ) x 100
2 P0 q0 +∑ P0 q1
P01 L + P01 P
=
2
∑ P1 q
(vi) Kelly’s formula: P01 k = ∑ P0 q
x 100
Problem :1 Compute (i) Laspeyre’s (ii) Paasche’s and (iii) Fisher’s index number.
Price Quantity
Item Base year Current year Base year Current year
A 6 10 50 50
B 2 2 100 120
C 4 6 60 60
D 10 12 30 25
Solution:
∑ P1 q0
(i) Laspeyre’s formula: P01 L = ∑ x 100
P0 q0
1420
= x 100 = 136.54
1040
∑ P1 q1
(ii) Paasche’s formula: P01 P = ∑ x 100
P0 q1
1400
= x 100 = 135.92
1030
94
∑P q ∑P q
(iii) Fisher’s formula: P01 F = √∑ 1 0 x ∑ 1 1 x 100
P q P q 0 0 0 1
1420 1400
=√ × x 100
1040 1030
= 136.23 (or)
= √136.54 × 135.92
= 136.23
Weights Price
p
Commodity W 1995 1998 P = p1 x 100 WP Log p W log P
0
A 40 16 20 125 5000 2.0969 83.8760
B 25 40 60 150 3750 2.1761 54.4025
C 5 2 3 150 750 2.1761 10.8805
D 20 5 7 140 2800 2.1461 42.9220
E 10 2 4 200 2000 2.3010 23.0100
Total ∑w = ---- ----- ------ ∑ WP= ------ ∑ W log P =
100 14300 215.0910
95
∑ WP
(i) Using A.M., P01 = ∑W
14300
= = 143
100
∑ Wlog P
(ii) Using G.M., P01 = Antilog [ ∑W
]
215.0910
= Antilog[ ]= 141.55
100
∑P
P01 =∑ P1 X 100
0
By Laspeyre’s formula,
∑ P1 q0
P01 = ∑ x 100
P0 q0
By Paasche’s formula,
∑ P1 q1
P01 = ∑ x 100
P0 q1
By Fisher’s Formula,
∑P q ∑P q
P01 F = √∑ 1 0 x ∑ 1 1 x 100
P q P q
0 0 0 1
P01 given the relative change in price while Q01 given the relative change in
quantity. Hence, P01× Q01 should give the relative change in price multiplied by
∑P q
quantity. And so should be equal to= ∑ P1 q1
0 0
96
➢ Circular Test:
Circular test is an extension of the time reversal test. If three years 0, 1,
and 2 are under consideration, this requires the formula to be such that,
For each commodity the price in a year is divided by that in 1995 and is
multiplied by 100 to get the price relative. Using A.M., the price indices are calculated
and are given in the last column of the above table.
For the first year which is the base year, fixed base index number as well as each
P is 100.
97
Current year C.B.I X Preceding year F.B.I
Current year F.B.I =
100
Cost of living index number shows the impact of changes in the prices of a
number of commodities and services on a particular class of people in the current
year in comparison with the base year, cost of Living Index Number.
Formula:
∑ wlog P
Cost of Living Index Number = Antilog ( ∑W
)
Uses:
1. Cost of living index numbers are the indicators of changes in real wages. Money
wages ar changing and so is prices. Cost of living index numbers help to know
whether money wages overtake the rising prices or are overpowered by them.
2. Decisons on dearness allowance are based on the cost of living indices.
3. They are further used for deflation of income and value in national accounts.
Problem: 1 Show that Fisher’s ideal index satisfies both time reversal and factor reversal tests,
using the following data commonly.
98
Solution:
1990 1992 p0 q0 p1 q0 p0 q1 p1 q1
Commodity p0 q0 p1 q1
A 6 50 10 56 300 500 336 560
B 2 100 2 120 200 200 240 240
C 4 60 6 60 240 360 240 360
D 10 30 12 24 300 360 240 288
E 8 40 12 36 320 480 288 432
Total ---- --- --- --- ∑ p0 q0 = ∑ p1 q0 = ∑ p0 q1 = ∑ p1 q1 =
1360 1900 1344 1880
By Fisher’s formula, after ignoring the facto 100,
∑ P1 q0 ∑ P1 q1 1900 1880
P01 = √∑ P ×∑ =√ ×
0 q0 P0 q1 1360 1344
∑ P 0 q1 ∑ p0 q0
P10 = √∑ P ×∑
1 q1 P 1 q0
1344 1360
=√ × and so
1880 1900
∑P q ∑P q 1344 1880
Q01 = √∑ p0q1 × ∑ P1q1 = √ ×
0 0 1 0 1360 1900
1880 ∑P q
= = ∑ p 1q 1
1900 0 0
Using the given data, Fisher’s index in found to satisfy both time reversal and
factor reversal tests.
EXAMPLE:
99
Solution:
Prices Price Relative
Total Index No
year Commodity (P) commodity
I II III I II III (∑ 𝑃) (∑ 𝑃 ÷ 𝑁)
Problem:1
Construct cost of living index, for 2000 taking 1999 as the base year from the
following data using ‘Aggregate Expenditure’ Method.
100
∑p q
Cost of Living Index = ∑ p1 q0 x 100
0 0
156.60
= 131.50 × 100 =119.0
Problem: Calculate the cost of living index number from the following data.
P0 P1 P
Item Weight W P=P1 x 100
0 WP
Food 39 47 4 120.51 482.04
Fuel 8 12 1 150.00 150.00
Clothing 14 18 3 128.57 385.71
House Rent 12 15 2 125.00 250.00
Miscellaneous 25 30 1 120.00 120.00
Total ------ ------- ∑ W = 11 ------ ∑ WP=1387.75
∑ WP 1387′75
Cost of Living Index Number = ∑ W = = 126.16
11
Problem: 3
Using geometric mean, calculate the cost of living index number for the year 2000.
P0 P1 P
Commodity W P=P1 x 100 Log P W log P
0
Food 40 108 40 180.0 2.2553 90.2120
Clothing 50 94 17 188.0 2.2742 38.6614
Fuel 40 65 13 162.5 2.2909 28.7417
House Rent 125 225 27 180.0 2.2553 60.8931
Miscellaneous 120 240 3 200.0 2.3010 6.9030
∑ W=100 ∑ W log P
=225.4112
101
∑ Wlog P
Cost of Living Index Number = Antilog ( ∑W
)
225.4112
= Antilog ( )
100
Problem: 1 Calculate fixed base index numbers from the following prices:
For the first year which is the base year, fixed base index number as well as each P is 100.
Problem: 1 Prepare index numbers from the average prices of three groups of commodities
given below by taking the base year 1998 and the weights as 5, 3, and 2 respectively.
102
The price of each commodity in every year is divided by its price in 1998 and is
multiplied by 100 to get the price relative (P). The price relatives of the three commodities are
multiplied by 5, 3, and 2 respectively to get WP values. They are added year wise (∑ wp)and the
total is divided by 10 (∑ w) to get fixed base index numbers.
Problem: 2 from the following prices of three groups of commodities for the years 1993 to 1997
find the chain base index numbers.
The price of each commodity in every year is divided by its price in the preceding
year and is multiplied by 100 to get the link relative (P) As no weight is given, link
relatives are added year wise and the total is divided by 3. The average of each year is
multiplied by the chain index number of the preceding year and is divided by 100 to get
the chain index number of that year. For the first year(1993) the link relatives and the
chain base index number are taken as 100 each.
103
Solution :
∑p q
Cost of Living Index = ∑ p1 q0 x 100 = 119.09
0 0
Problem:2 Calculate the cost of living index number from the following data.
104
Solution:
P0 P1 P
Item Weight W P=P1 x 100
0 WP
Food 39 47 4 120.51 482.04
Fuel 8 12 1 150.00 150.00
Clothing 14 18 3 128.57 385.71
House Rent 12 15 2 125.00 250.00
Miscellaneous 25 30 1 120.00 120.00
Total ------ ------- ∑ W = 11 ------ ∑ WP=1387.75
∑ WP
Cost of Living Index Number = ∑ W = 126.16
Problem: 3 Using geometric mean, calculate the cost of living index number for the year 2000.
P0 P1 P
Commodity W P=P1 x 100 Log P W log P
0
Food 40 108 40 180.0 2.2553 90.2120
Clothing 50 94 17 188.0 2.2742 38.6614
Fuel 40 65 13 162.5 2.2909 28.7417
House Rent 125 225 27 180.0 2.2553 60.8931
Miscellaneous 120 240 3 200.0 2.3010 6.9030
∑ W=100 ∑ W log P
=225.4112
∑ Wlog P
Cost of Living Index Number = Antilog ( ∑W
)
225.4112
= Antilog ( )
100
= Antilog 2.2541
= 179.51.
105
UNIT-V
The series of values might have been observed at regular intervals of time such as daily
sales, Annual profits and decennial census.
Variables such as Sales, Production, Profit and Population have different values at
different points of time.
(i) The Analysis of Time series helps to know the past conditions
(ii) It helps in assessing the present conditions.
(iii) It helps to predict reliably
(iv) It facilitates Comparison
(v) It fore warns
➢ There are a large number of forces affecting time series. As a result, there are fluctuations
of time series.
➢ There are 4 basic types of variations and these are called the components (or) elements of
time series.
1. Secular Trend
2. Seasonal variation
3. Cyclical variation
4. Irregular variations
106
Components of time series
Regular
Secular trend:
Additive model:
Additive model assumes that all the components of the time series are independent of one
another and describes all the components as absolute values.
It assumes that all the components are due to different causes but they can affect one
another
➢ It describes only the trend as an absolute value and while the other components are
expressed as rate or %
107
Y = T×S ×C×I
➢ In a tradition (or) classical time series the most commonly assumed mathematical model
is the multiplication model
Mixed model:
➢ According to additive model, the trend values then we have to segregate s. c and I
T=Y- (S+C+I)
T=T-(S+C+I)
Additive Multiplication
Expression Y=T+C+S+I Y=T×C×S×I
Absolute values/ Rates All components of a time Only trend is expensed as an
series are expressed as absolute values and the other
absolute values. are expressed as rate (or) %
Y = T+S ×C+I
Y = T+S×C× I
Long Term:
1) Secular Trend
There are 4 method of estimate secular trend, They are
➢ Graphical Method
➢ Method of semi average
➢ Method of Moving average
➢ Method of Least Square
(i) Graphic method:
➢ It is also known as free - hand method x axis represents time and y axis.
The observed data.
➢ After marking all the points, the best line is drawn. It is called trend line.
➢ The line is drawn such the following three conditions are satisfied.
108
I. The number of points above the line is equal to The number of points below the line.
II. The sum of the vertical distances of the points above the line equals that of the points
below the line.
III. The sum of the squares of the vertical distances of all the points from line is the
minimum.
Merits:
Demerits:
➢ Here the series of data is divided into halves. Then the average is found out for
each half.
➢ The avg values are plotted on the graph paper against the mid points.
➢ When there are even no% the middle most year and the A.m of the observed
values are found out for each half.
➢ When it is odd the middle most year and the A.m of the observed values are
found out for each half.
➢ When it is odd the middle most year and the A.m of the observed values are
omitted.
➢ Based on the points the values are marked on the graph sheet. And are found by a
straight line called trend line.
Example-1:
Year sales
2002 60
2003 75 =216/3 =72
2004 81
2005 110
2006 106 =336/3=112
2007 120
109
Example-2:
Year sales
2001 110
2002 105 330/3=110
2003 115
2004 112 Left out
Merits:
2.Does not depend-personal judgement- Every one get the same trend line.
Demeits:
1.The method of moving averages is one of the most useful methods of estimating
trend.
2. It is an improvement of over semi-average method.
3. It is an algebraic mrthod graph sheet-not used.
Example:
a+ b+c b +c+ d c +d + e
, , and so on
3 3 3
a +b +c+ d + e
5
Example 1: Using three year moving average determine the trend and short-term fluctuation.
Year 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992
production 21 22 23 25 24 22 25 26 27 26
110
Solution:
Example 2:
111
Case 2. Period of Moving Average is an even numbers such as 4 or 6 or 8 etc..
Example 1:
Using four yearly moving averages, calculate the trend values and short term fluctuations
Year 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990
production 464 515 518 467 502 540 557 571 586 612
Solution:
4 Yearly 2 Period 4 Yearly Centered Short – term
Production
Year Moving Moving Moving Average Fluctuations
(Y)
Average Totals (Yt ) ( Y – Yt )
1981 464 - - -
1982 515 - - -
1964
1983 518 3966 495.75 22.25
2002
1984 467 4026 503.63 - 36.63
2027
1985 502 4093 511.63 -9.63
2066
1986 540 4236 529.50 10.50
2170
1987 557 2244 553.00 4.00
2254
1988 571 4580 572.50 -1.50
2326
1989 586 - - -
1990 612 - - -
Example 2:
Calculate 6 yearly centered moving averages of the earnings per share (EPS) of a
company.
112
Solution:
6 Yearly 2 Period
EPS 6 Yearly Centered Moving
Year Moving Moving
(Y) Average (Yt )
Average Totals
1985 10 - -
1986 12 - -
1987 13 - -
78
1988 15 162 13.50
84
1989 14 174 14.50
90
1990 14 189 15.75
99
1991 16 207 17.25
108
1992 18 228 19.00
120
1993 22 255 21.25
135
1994 24 279 23.25
144
1995 26 291 24.25
147
1996 29 297 24.75
150
1997 25 303 25.25
153
1998 21 - -
1999 25 - -
2000 27 - -
Merits:
113
Example 8: Fit a straight line trend equation to the following data by the method of least squares
and estimate the value of sales for the year 1985.
Solution: Let 𝑌 = 𝑎 + 𝑏𝑋 be the equation of the trend line where X – year and Y – Sales .
= X – 1981
∑ 𝑦 = 𝑁𝐴 + 𝐵 ∑ 𝑥
∑ 𝑥𝑦 = 𝑁 ∑ 𝑥 + 𝐵 ∑ 𝑥 2
700=5A +0B
A=140
∑ 𝑥𝑦 = 𝑁 ∑ 𝑥 + 𝐵 ∑ 𝑥 2
B=20
That is Y=140+20(X-1981)
Corresponding to different values of X, the right hand side given the trend component (Yt).
114
Hence, the equation is written as
Yt = 140+20(X-1981)
=100
=120
=140
=160
=180
=220
Example 2: Fit a linear trend equation by the method of least squares and estimate the estimate
the net profit in 2003.
Plot the observed values on a graph sheet. Draw the trend line also.
115
Mid year, 𝑥̅ = 1998
∑𝑦 401
Trend of the mid year, 𝑎= = = 57.29
𝑁 7
∑ 𝑥𝑦 330
Annual change in trend, 𝑏= ∑ 𝑥2
= = 11.79
28
= 57.29 – 35.37
= 21.92
By adding 11.79 to the trend of each year, the trend of the next year is found.
= 57.29 + 11.79 x 5
= 57.29 + 58.95
Merits:
Demerits:
The straight line trend (or) the first degree parabola is given by
Y = a+bx
The Equation for the second degree parabola
Y = a+bx+cx2
116
The Equations for the third degree parabola
Y = a+bx+cx2+dx3
Fitting of first degree parabola,
The Normal equations to find a and b are:
∑y = Na + b ∑x
∑xy = a ∑x + b∑x2
Types:
1. Linear trend
2. Non-linear (or) Curvilinear trend
➢ If we got a straight line when the values of the time series are plotted on a graph. Then it
is called straight line trend or linear trend.
➢ If we plot the values in the graph and if it forms a curve. Then it is called non-lineal (or)
curvilinear trend.
Uses of trend:
➢ The trend describes the basic growth tendency ignoring short time fluctuations.
➢ It describes the pattern of behavior which has characterized the series in the past.
➢ Future behavior can be forecasted
➢ Trend analysis facilitates us to compare two (or) more time series over different period of
time and this helps to draw conclusion about them.
Definition:
A variance which occurs with some degree of regularity within a specified period of
one year (or) shorter is called seasonal variation.
117
(b) Customs, traditions and habits of the people:
➢ Sales of crockers and fire works is found to be more during deepavali every year.
➢ Cloth shops register very good sales during festival seasons such as deepavali,
ramzon, Christmas.
➢ All thes variation in sales, work load, are due to cultoms.
Cyclical fluctuations:
Irregular variation:
➢ The other name for this is random variation (or) erratic fluctuations.
➢ Variations which donot come under the other three components are called
Irregular variation.
➢ Fire. Floods, earthquakes, cause irregular variations.
➢ For example: there may be very poor sales on a particular day in a leading
cloth shop on the eve of deepavali
➢ Causes for such a happening may not be known.
Models:
➢ There exists certain relations b/w the components and the series of
observation.
➢ The relation between the observed value and the component is called model.
a) Additive model b)Multiplicative model
➢ By the method of least square, a straight line trend can be fitted to the given
time series.
➢ It is mathematical, as well as analytical method.
118
➢ Helps in forecasting and prediction.
➢ The trend line is called the line of best fit.
➢ The sum of deviations of the actual values and the trend value is the zero.
➢ Sum of squares of the deviation of the actual value and the trend value is least.
(i.e.,)
(Y-YC) =0 (Y-YC)2= least
SEASONAL FLUCTUATIONS
The following four methods are used to estimate the seasonal variations.
This method assumes absence of trend in a time series. The following are the steps.
The data are arranges season – wise in chronological order.
For each season the total of the seasonal is found and called seasonal total
Each seasonal total is divided by number of year and seasonal average is obtained.
The total and the average of the seasonal averages are found. The average is called
grand average.
Seasonal index of every season is calculated as follows.
𝒔𝒆𝒂𝒔𝒐𝒏𝒂𝒍 𝒂𝒗𝒆𝒓𝒂𝒈𝒆
Seasonal index = ×100
𝒈𝒓𝒂𝒏𝒅 𝒂𝒗𝒆𝒓𝒂𝒈𝒆
Problem: 1
Assuming no trend in the series, calculate seasonal indices for the following data.
QUARTER
Year I II III IV
1994 78 66 84 80
1995 76 74 82 78
1996 72 68 80 70
1997 74 70 84 74
1998 76 74 86 82
119
Solution:
QUARTER
Year
I II III IV
1994 78 66 84 80
1995 76 74 82 78
1996 72 68 80 70
1997 74 70 84 74
1998 76 74 86 82
Seasonal total 376 352 416 384 Total grand average
Seasonal average 75.2 70.4 83.2 76.8 305.6 76.4
Seasonal index 98.4 92.2 108.9 100.5 400.0 -
Merits:
Demerits:
i) It assumes the absence the absence of trend in a time series. This assumption is not
always true.
ii) It assumes that the averaging process eliminates the seasonal fluctuation. It is also not
true.
120
121