Professional Documents
Culture Documents
Definition
Horace Secrist defines “Statistics as the aggregate of facts affected to marked extent by the
multiplicity of causes, numerically expressed, enumerated or estimated according to a
reasonable standard of accuracy, collected in a systematic manner for the predetermined
purpose and placed in relation to each other”
Meaning
In plural sense, statistics refers to information in terms of numbers or numerical data.
In the Singular Sense: “Statistics refers to the body of technique or methodology, which has
been developed for the collection, presentation and analysis of quantitative data and for the use
of such data in decision making.”
Characteristics of statistics
1. It consists of aggregates of facts:
In the plural sense, statistics refers to data, but data to be called statistics must consist of
aggregate of certain facts.
Limitations of statistics
1. Qualitative Aspect Ignored:
The statistical methods don’t study the nature of phenomenon which cannot be expressed in
quantitative terms. Such phenomena cannot be a part of the study of statistics. These include
health, riches, intelligence etc. It needs conversion of qualitative data into quantitative data.
2. It does not deal with individual items:
It is clear from the definition given by Prof. Horace Sacrist, “By statistics we mean aggregates
of facts…. and placed in relation to each other”, that statistics deals with only aggregates of
facts or items and it does not recognize any individual item.
3. Results are true only on average:
The results are interpolated for which time series or regression or probability can be used.
These are not absolutely true.
4. Too Many methods to study problems:
In this subject we use so many methods to find a single result. Variation can be found by
quartile deviation, mean deviation or standard deviations and results vary in each case.
5. Statistical results are not always beyond doubt:
“Statistics deals only with measurable aspects of things and therefore, can seldom give the
complete solution to problem. They provide a basis for judgement but not the whole
judgment.”
Average
An average is a single value that represents a group of value. It lies somewhere between largest
and smallest item.
Characteristics of a good average:
1. Good Average should be based on all the observations: Only those averages, where
all the data are used give best result, whereas the averages which use less data are not
representative of the whole group.
2. Good Average should not be unduly affected by extreme value: No term should
affect the average too much. If one or two very small or very large items unduly affect
the average, then the average cannot be really typical of the entire group. Thus extreme
terms may distort the average and reduce its usefulness.
3. Good Average should be rigidly defined: There should be no confusion about the
meaning or description of an average. It must have a rigid or to the point definition.
4. Good Average should be easy to calculate and simple to understand: If the
calculation of an average involves too much mathematical processes, it will not be
easily understood and its use will be limited only to a limited number of persons. This
average cannot be a popular average. It should be easy to understand.
5. Good Average should be capable of further algebraic treatment: Measures of
central tendency are used in many other techniques of statistical analysis like measures
of Dispersion, Correlation etc.
6. Good Average should be found by graphic methods also: That average is considered a
good average which can be found by arithmetic as well as by graphic method.
7. Good Average should not be affected by variations of sampling: A good average
will be least affected by sampling fluctuations. If a few samples are taken from the
same universe, the average should be such as has the least variation in values derived
in the individual samples. The results obtained will be considered to be the true
representative of the universe in this case.
8. Good Average should be popular: A popular average which is known to common
people will be more useful as a simple person will be able to understand it. Otherwise
its use will be limited to higher section of people only.
9. Good average should have a Clear and Stable Definition: A good average should
have a clear and stable definition.
10. Good average should be Absolute Number: A good average should be absolute in
character.
11. Good average should be Possible to find central Tendency for open end class
intervals: In many distributions’ ends are open. So, a good average is one which can be
calculated even in open end class intervals.
Arithmetic Mean
The mean is the average of the numbers. It is easy to calculate: add up all the numbers, then
divide by how many numbers there are. In other words, it is the sum divided by the count.
Median of a series is the size of that item of the series which occupies the central position of
the series when the items are arranged in the ascending or descending order of their magnitude.
Merits of Median:
1. It is simple to understand and easy to calculate, particularly is individual and discrete
series.
2. It is not affected by the extreme items in the series.
3. It can be determined graphically.
4. For open-ended classes, median can be calculated.
5. It can be located by inspection, after arranging the data in order of magnitude.
Demerits of Median:
1. It does not consider all variables because it is a positional average.
2. The value of median is affected more by sampling fluctuations
3. It is not capable of further algebraic treatment. Like mean, combined median cannot be
calculated.
4. It cannot be computed precisely when it lies between two items.
Mode:
Mode is that value a dataset, which is repeated most often in the database. In other words, mode
is the value, which is predominant in the series or is at the position of greatest density. Mode
may or may not exist in a series, or if it exists, it may not be unique, or its position may be
somewhat uncertain.
Merits of Mode:
1. Mode is the most representative value of distribution, it is useful to calculate model wage.
2. It is not affected by the extreme items in the series.
3. It can be determined graphically.
4. For open-ended classes, Mode can be calculated.
5. It can be located by inspection.
Demerits of Mode:
1. It is not based on all observations.
2. Mode cannot be calculated when frequency distribution is ill-defined
3. It is not capable of further algebraic treatment. Like mean, combined mode cannot be
calculated.
4. It is not rigidly defined measure because several formulae to calculate mode is used.
Dispersion
Dispersion refers to the variability in the size of items. Dispersion means the extent to which a
numerical data is likely to vary about an average value. Measures of dispersion is the extent to
which values in a distribution differ from the average of the distribution.
Absolute & relative dispersion
Absolute & relative dispersion are two different ways to measure the spread of a data set. They
are used extensively in biological statistics, as biological phenomena almost always show some
variation and spread. ... Absolute measures always have units, while relative measures do not.
Range
In statistics, the range of a set of data is the difference between the largest and smallest values.
Quartile Deviation
The Quartile Deviation can be defined as half of the difference between the third and first
quartiles.
Mean deviation
Mean deviation is a statistical measure of the average deviation of values from the mean in a
sample.
Standard Deviation
According to Spiegel, Standard deviation “is the square root of the mean of the squares of the
deviations of all the values of a series from their Arithmetic mean”
Coefficient of variation
The coefficient of variation represents the ratio of the standard deviation to the mean.
MODULE II
CORRELATION ANALYSIS
Meaning
Correlation is a statistical measure that expresses the extent to which two variables are linearly
related.
Two variables are said to be correlated if the change in one variable results in a corresponding
change in the other variable. That is when two variables move together, we say they are
correlated. For example, when the price of a commodity rises the supply for that commodity
also rises. On the other hand, if price falls the supply also falls. So both variables move together
or they move in sympathy. Hence price and supply are correlated.
Definition
Correlation is defined as “the tendency of two or more groups or series of items to vary together
directly or inversely”. Boddington states that “whenever some definite connection exists
between the two or more groups, classes or series or date there is said to be correlation”.
Significance
The perfect positive correlation specifies that, for every unit increase in one variable, there is
proportional increase in the other.
Negative correlation
The increase in one variable (X) results in a corresponding decrease in the other variable (Y),
the correlation is said to be negative correlation.
The negative correlation ranges from 0 to – 1; the lower limit giving the perfect negative
correlation. The perfect negative correlation indicates that for every unit increase in one
variable, there is proportional unit decrease in the other.
Zero correlation
Zero correlation means no relationship between the two variables X and Y; i.e. the change in
one variable (X) is not associated with the change in the other variable (Y). The zero correlation
is the mid-point of the range – 1 to + 1.
2. Linear or Curvilinear Correlation:
When the amount of change in one variable leads to a constant ratio of change in the other
variable, correlation is said to be linear.
Correlation is said to be non linear (curvilinear), when the amount of change in one variable is
not in constant ratio to the change in the other variable.
The correlation is said to be simple when only two variables are studied. The correlation is
either multiple or partial when three or more variables are studied. The correlation is said to
be Multiple when three variables are studied simultaneously. Partial correlation measures the
strength of a relationship between two variables, while controlling for the effect of one or more
other variables.
Dependent Variable: This is the variable that we are trying to understand or forecast.
Independent Variable: These are factors that influence the analysis or target variable and
provide us with information regarding the relationship of the variables with the target variable.
• Regression establishes how x causes y to change, and the results will change
if x and y are swapped. With correlation, x and y are variables that can be
interchanged and get the same result.
• Correlation is a single statistic, or data point, whereas regression is the entire equation
with all of the data points that are represented with a line.
• Correlation shows the relationship between the two variables, while regression allows
us to see how one affects the other.
• The data shown with regression establishes a cause and effect, when one changes, so
does the other, and not always in the same direction. With correlation, the variables
move together.
• Both work to quantify the direction and strength of the relationship between two
numeric variables.
• Any time the correlation is negative, the regression slope (line within the graph) will
also be negative.
• Any time the correlation is positive, the regression slope (line within the graph) will be
positive.
Linear regression
A regression line is a straight line that de- scribes how a response variable y changes as an
explanatory variable x change. We often use a regression line to predict the value of y for a
given value of x.
The standard error of the regression (S), also known as the standard error of the estimate,
represents the average distance that the observed values fall from the regression line.
Conveniently, it tells you how wrong the regression model is on average using the units
of the response variable.
MODULE IV
TIME SERIES
A time series is a sequence of data points that occur in successive order over some period
of time. Time series analysis is a statistical technique that deals with time series data, or
trend analysis. Time series data means that data is in a series of particular time periods or
intervals.
Importance (utility) of Time series analysis
The analysis of time series is important not only to businessmen and economist, but also to
scientists, biologists etc, because of the following reasons.
1) Time series analysis disclose changes in time and changes it he values of the
variable.
2) It helps in understanding past behaviour: By observing past data, one can
understand the past behaviour of the variable under study.
3) It helps in predicting.
4) It helps in evaluating current programmes
5) It facilitates comparison.
Components of time series.
1. Secular trend.:
The word trend means 'tendency'. So, secular trend is that component of the time
series which gives the general tendency of the data for a long period. It is smooth, regular
and long-term movement of a series. Eg; Growth of population in a locality over decades.
2. Seasonal variations:
Seasonal variation is variation in a time series within one year that is repeated more or less
regularly. Seasonal variation may be caused by the temperature, rainfall, public holidays,
cycles of seasons or holidays.
3. Cyclical fluctuations.
The cyclical component of a time series refers to (regular or periodic) fluctuations around
the trend, excluding the irregular component, revealing a succession of phases of expansion
and contraction.
4. Irregular variations.
Irregular variations or random variations constitute one of four components of a time
series. They correspond to the movements that appear irregularly and generally during short
periods. Irregular variations do not follow a particular model and are not predictable.
Method of least square
Least Square is the method for finding the best fit of a set of data points. It minimizes the sum
of the residuals of points from the plotted curve. It gives the trend line of best fit to a time
series data.
MODULE V
INDEX NUMBERS
Meaning
An index number is a statistical device for measuring changes in the magnitude of a group of
related variables. It represents the general trend of diverging ratios, from which it is calculated.
It is a measure of the average change in a group of related variables over two different
situations.
Definition
Index number in statistics is the measurement of change in a variable or variables across a
determined period. It will show general relative change and not a directly measurable figure.
An index number is expressed in percentage form.
Characteristics of index numbers
1. Index numbers are specialised averages.
2. Index numbers are expressed in percentages.
3. Index numbers measure the change in the level of a phenomenon in one
single figure.
4. Index numbers measure changes not capable of direct measurement.
5. Index numbers are meant for comparison. They study changes in the
level of a phenomenon for a period of time as compared with its level
for another period of time.
6. Index numbers measure the effect of change from one time to another,
from one place to another etc.
The index numbers are given in percentages. So it is useful for comparison and easy to
understand the changes between two points of time.
Help in framing suitable policies
Index numbers are more useful to frame economic and business policies. For example,
consumer price index numbers are useful in fixing dearness allowance to the employees.
Useful in deflating
Price index numbers are used for connecting the original data for changes in prices. The price
index is used to determine the purchasing power of monetary unit.
Compares standard of living
Cost of living index of different periods and of different places will help us to compare the
standard of living of the people. This enables the government to take suitable welfare measures.
Special type of average
All the basic ideas of averages are employed for the construction of index numbers. In averages,
the data are homogeneous (in the same units) but in index number, we average the variables
which have different units of measurements. Hence, it is a special type of average.
Methods of Construction of Index Number:
In constructing an index number, the following steps should be noted:
1. Purpose of the Index Number:
Before constructing an index number, it should be decided the purpose for which it is needed.
An index number constructed for one category or purpose cannot be used for others. A cost of
living index of working classes cannot be used for farmers because the items entering into their
consumption will be different.
2. Selection of Commodities:
Commodities to be selected depend upon the purpose or objective of the index number to be
constructed. But the number of commodities should neither be too large nor too small.
Moreover, commodities to be selected must be broadly representative of the group of
commodities. They should also be comparable in the sense that standard or graded items should
be taken.
3. Selection of Prices:
The next step is to select the prices of these commodities. For this purpose, care should be
taken to select prices from representative persons, places or journals or other sources. But they
must be reliable.
4. Selection of an Average:
Since index numbers are averages, the problem is how to select an appropriate average. The
two important averages are the arithmetic mean and geometric mean. The arithmetic mean is
the simpler of the two. But geometric mean is more accurate. However, the average prices
should be reduced to price relatives (percentages) either on the basis of the fixed base method
or the chain base method.
5. Selection of Weights:
While constructing an index number due weightage or importance should be given to the
various commodities. Commodities which are more important in the consumption of
consumers should be given higher weightage than other commodities. The weights are
determined with reference to the relative amounts of income spent on commodities by
consumers. Weights may be given in terms of value or quantity.
7. Selection of Formula:
A number of formulas have been devised to construct an index number. But the selection of an
appropriate formula depends upon the availability of data and purpose of the index number.
No single formula may be used for all types of index numbers.
An unweighted index gives equal allocation to all securities within the index. A weighted
index gives more weight to certain securities, typically based on market capitalization.
One index type isn't necessarily better than another, they are just showing data in different
ways.
Kinds of Index numbers
Index number may be classified in terms of what they measure. In economic and business
the classifications are:
(1) price;
(2) quantity;
(3) value; and
(4) special purpose.
Tests of Adequacy of Index Number
1. Unit Test.
2. Time Reversal Test.
3. Factor Reversal Test.
4. Circular Test
Chain index number
A chain index is an index number in which the value of any given period is related to the value
of its immediately preceding period (resulting in an index for the given period expressed
against the preceding period = 100).
Bases shifting
For a variety of reasons, it frequently becomes necessary to change the reference base of
an index number series from one time to another without returning to the original raw data and
recomputing the entire series. This change of reference base period is usually referred to as
“shifting the base”.
Splicing
The process of combining two or more index numbers covering different bases into a single
series is called splicing.
Deflating
The process of adjusting a series of salary or wages or income according to current price
changes to find out the level of real salary wages or income is called deflating of index
numbers. It is necessary when price level is increasing and cost of living is also increasing.
Problems in the construction of index numbers
There are many difficulties faced in the construction of index numbers. They are
discussed as under: