You are on page 1of 16

BUSINESS STTISTICS

Definition
Horace Secrist defines “Statistics as the aggregate of facts affected to marked extent by the
multiplicity of causes, numerically expressed, enumerated or estimated according to a
reasonable standard of accuracy, collected in a systematic manner for the predetermined
purpose and placed in relation to each other”
Meaning
In plural sense, statistics refers to information in terms of numbers or numerical data.
In the Singular Sense: “Statistics refers to the body of technique or methodology, which has
been developed for the collection, presentation and analysis of quantitative data and for the use
of such data in decision making.”
Characteristics of statistics
1. It consists of aggregates of facts:
In the plural sense, statistics refers to data, but data to be called statistics must consist of
aggregate of certain facts.

2. It is affected by many causes:


It is not easy to study the effects of one factor only by ignoring the effects of other factors.
Here we have to go for the effects of all the factors on the phenomenon separately as well as
collectively, because effects of the factors can change with change of place, time or situation.

3. It should be numerically expressed:


A data to be called statistics should be numerically expressed so that counting or measurement
of data can be made possible. It means that the data or the fact to constitute statistics must be
capable of being expressed in some quantitative form.

4. It must be enumerated or estimated accurately:


As stated above that the statements should be precise and meaningful. For getting reasonable
standard of accuracy the field of enquiry should not be very large. If it is infinite or very large,
even enumeration of data is impossible and reasonable standard of accuracy may not be
achieved. To achieve it we have to make on estimate according to reasonable standard of
accuracy depending upon the nature and purpose of collection of data.

5. It should be collected in a systematic manner:


Another characteristic of statistics is that the data should be collected in a systematic manner.
The data collected in a haphazard manner will lead to difficulties in the process of analysis,
and wrong conclusions.

6. It should be collected for a predetermined purpose:


Before we start the collection of data, we must be clear with the purpose for which we are
collecting the data. If we have no information about its purpose, we may not be collecting data
according to the needs. We may need some more relevant data to achieve the required purpose,
which we would miss in the event of its ignorance.
7. It should be capable of being placed in relation to each other:
It is last but not less important of the characteristics of the statistics. The collection of data is
generally done with the motive to compare. If the figures collected are not comparable, in
that case, they lose a large part of their significance.

Limitations of statistics
1. Qualitative Aspect Ignored:
The statistical methods don’t study the nature of phenomenon which cannot be expressed in
quantitative terms. Such phenomena cannot be a part of the study of statistics. These include
health, riches, intelligence etc. It needs conversion of qualitative data into quantitative data.
2. It does not deal with individual items:
It is clear from the definition given by Prof. Horace Sacrist, “By statistics we mean aggregates
of facts…. and placed in relation to each other”, that statistics deals with only aggregates of
facts or items and it does not recognize any individual item.
3. Results are true only on average:
The results are interpolated for which time series or regression or probability can be used.
These are not absolutely true.
4. Too Many methods to study problems:
In this subject we use so many methods to find a single result. Variation can be found by
quartile deviation, mean deviation or standard deviations and results vary in each case.
5. Statistical results are not always beyond doubt:
“Statistics deals only with measurable aspects of things and therefore, can seldom give the
complete solution to problem. They provide a basis for judgement but not the whole
judgment.”

Measures of central value


A measure of central tendency is a single value that attempts to describe a set of data by
identifying the central position within that set of data.

Average
An average is a single value that represents a group of value. It lies somewhere between largest
and smallest item.
Characteristics of a good average:
1. Good Average should be based on all the observations: Only those averages, where
all the data are used give best result, whereas the averages which use less data are not
representative of the whole group.
2. Good Average should not be unduly affected by extreme value: No term should
affect the average too much. If one or two very small or very large items unduly affect
the average, then the average cannot be really typical of the entire group. Thus extreme
terms may distort the average and reduce its usefulness.
3. Good Average should be rigidly defined: There should be no confusion about the
meaning or description of an average. It must have a rigid or to the point definition.
4. Good Average should be easy to calculate and simple to understand: If the
calculation of an average involves too much mathematical processes, it will not be
easily understood and its use will be limited only to a limited number of persons. This
average cannot be a popular average. It should be easy to understand.
5. Good Average should be capable of further algebraic treatment: Measures of
central tendency are used in many other techniques of statistical analysis like measures
of Dispersion, Correlation etc.
6. Good Average should be found by graphic methods also: That average is considered a
good average which can be found by arithmetic as well as by graphic method.
7. Good Average should not be affected by variations of sampling: A good average
will be least affected by sampling fluctuations. If a few samples are taken from the
same universe, the average should be such as has the least variation in values derived
in the individual samples. The results obtained will be considered to be the true
representative of the universe in this case.
8. Good Average should be popular: A popular average which is known to common
people will be more useful as a simple person will be able to understand it. Otherwise
its use will be limited to higher section of people only.
9. Good average should have a Clear and Stable Definition: A good average should
have a clear and stable definition.
10. Good average should be Absolute Number: A good average should be absolute in
character.
11. Good average should be Possible to find central Tendency for open end class
intervals: In many distributions’ ends are open. So, a good average is one which can be
calculated even in open end class intervals.
Arithmetic Mean

The mean is the average of the numbers. It is easy to calculate: add up all the numbers, then
divide by how many numbers there are. In other words, it is the sum divided by the count.

Merits of Arithmetic Mean


• The arithmetic mean is simple to understand and easy to calculate.
• It is influenced by the value of every item in the series.
• A.M is rigidly defined.
• It has the capability of further algebraic treatment.
• It is a measured value and not based on the position in the series.
Demerits of Arithmetic Mean
• It is changed by extreme items such as very small and very large items.
• It can rarely be identified by inspection.
• In some cases, A.M. does not represent the original item. For example, average
patients admitted to a hospital are 10.7 per day.
• The arithmetic mean is not suitable in extremely asymmetrical distributions.
Median

Median of a series is the size of that item of the series which occupies the central position of
the series when the items are arranged in the ascending or descending order of their magnitude.

Merits of Median:
1. It is simple to understand and easy to calculate, particularly is individual and discrete
series.
2. It is not affected by the extreme items in the series.
3. It can be determined graphically.
4. For open-ended classes, median can be calculated.
5. It can be located by inspection, after arranging the data in order of magnitude.

Demerits of Median:
1. It does not consider all variables because it is a positional average.
2. The value of median is affected more by sampling fluctuations
3. It is not capable of further algebraic treatment. Like mean, combined median cannot be
calculated.
4. It cannot be computed precisely when it lies between two items.

Mode:
Mode is that value a dataset, which is repeated most often in the database. In other words, mode
is the value, which is predominant in the series or is at the position of greatest density. Mode
may or may not exist in a series, or if it exists, it may not be unique, or its position may be
somewhat uncertain.

Merits of Mode:
1. Mode is the most representative value of distribution, it is useful to calculate model wage.
2. It is not affected by the extreme items in the series.
3. It can be determined graphically.
4. For open-ended classes, Mode can be calculated.
5. It can be located by inspection.

Demerits of Mode:
1. It is not based on all observations.
2. Mode cannot be calculated when frequency distribution is ill-defined
3. It is not capable of further algebraic treatment. Like mean, combined mode cannot be
calculated.
4. It is not rigidly defined measure because several formulae to calculate mode is used.

Dispersion

Dispersion refers to the variability in the size of items. Dispersion means the extent to which a
numerical data is likely to vary about an average value. Measures of dispersion is the extent to
which values in a distribution differ from the average of the distribution.
Absolute & relative dispersion
Absolute & relative dispersion are two different ways to measure the spread of a data set. They
are used extensively in biological statistics, as biological phenomena almost always show some
variation and spread. ... Absolute measures always have units, while relative measures do not.
Range
In statistics, the range of a set of data is the difference between the largest and smallest values.

Quartile Deviation
The Quartile Deviation can be defined as half of the difference between the third and first
quartiles.
Mean deviation
Mean deviation is a statistical measure of the average deviation of values from the mean in a
sample.
Standard Deviation
According to Spiegel, Standard deviation “is the square root of the mean of the squares of the
deviations of all the values of a series from their Arithmetic mean”
Coefficient of variation
The coefficient of variation represents the ratio of the standard deviation to the mean.
MODULE II
CORRELATION ANALYSIS
Meaning
Correlation is a statistical measure that expresses the extent to which two variables are linearly
related.
Two variables are said to be correlated if the change in one variable results in a corresponding
change in the other variable. That is when two variables move together, we say they are
correlated. For example, when the price of a commodity rises the supply for that commodity
also rises. On the other hand, if price falls the supply also falls. So both variables move together
or they move in sympathy. Hence price and supply are correlated.

Definition

Correlation is defined as “the tendency of two or more groups or series of items to vary together
directly or inversely”. Boddington states that “whenever some definite connection exists
between the two or more groups, classes or series or date there is said to be correlation”.
Significance

i) Correlation helps us in determining the degree of relationship between variables. It enables


us to make our decision for the future course of actions.
(ii) Correlation analysis helps us in understanding the nature and degree of relationship which
can be used for future planning and forecasting.

Correlation and causation


Correlation is a statistical measure that expresses the extent to which two variables are linearly
related.
Causation indicates that one event is the result of the occurrence of the other event; i.e. there is
a causal relationship between the two events. This is also referred to as cause and effect.
Types of correlation
1. Positive, Negative or Zero Correlation:
Positive correlation
When the increase in one variable (X) is followed by a corresponding increase in the other
variable (Y); the correlation is said to be positive correlation. The positive correlations range
from 0 to +1; the upper limit i.e. +1 is the perfect positive coefficient of correlation.

The perfect positive correlation specifies that, for every unit increase in one variable, there is
proportional increase in the other.

Negative correlation

The increase in one variable (X) results in a corresponding decrease in the other variable (Y),
the correlation is said to be negative correlation.
The negative correlation ranges from 0 to – 1; the lower limit giving the perfect negative
correlation. The perfect negative correlation indicates that for every unit increase in one
variable, there is proportional unit decrease in the other.

Zero correlation
Zero correlation means no relationship between the two variables X and Y; i.e. the change in
one variable (X) is not associated with the change in the other variable (Y). The zero correlation
is the mid-point of the range – 1 to + 1.
2. Linear or Curvilinear Correlation:
When the amount of change in one variable leads to a constant ratio of change in the other
variable, correlation is said to be linear.
Correlation is said to be non linear (curvilinear), when the amount of change in one variable is
not in constant ratio to the change in the other variable.

3. Simple, partial and multiple Correlation:

The correlation is said to be simple when only two variables are studied. The correlation is
either multiple or partial when three or more variables are studied. The correlation is said to
be Multiple when three variables are studied simultaneously. Partial correlation measures the
strength of a relationship between two variables, while controlling for the effect of one or more
other variables.

Methods of Studying Correlation


1. Scatter Diagram Method.
A scatter diagram is used to examine the relationship between both the axes (X and Y) with
one variable. In the graph, if the variables are correlated, then the point drops along a curve or
line. A scatter diagram or scatter plot gives an idea of the nature of relationship.
In a scatter correlation diagram, if all the points stretch in one line, then the correlation is perfect
and is in unity. However, if the scatter points are widely scattered throughout the line, then the
correlation is said to be low. If the scatter points rest near a line or on a line, then the correlation
is said to be linear.
2. Coefficient of Correlation.
The correlation coefficient is a statistical measure of the strength of the relationship between
the relative movements of two variables. The values range between -1.0 and 1.0. A calculated
number greater than 1.0 or less than -1.0 means that there was an error in
the correlation measurement. Co efficient of correlation can be computed by applying the
methods given below.
1 Karl Pearson’s’ method
2. Spearman's Rank Correlation Coefficient and
3. Concurrent deviation method: (A very simple and casual method of
finding correlation when we are not serious about the magnitude of the two
variables is the application of concurrent deviations. ... The deviation in the x-
value and the corresponding y-value is known to be concurrent if both
the deviations have the same sign.)
MODULE III
REGRESSION ANALYSIS
In statistical modelling, regression analysis is a set of statistical processes for estimating the
relationships between a dependent variable (often called the 'outcome variable') and one or
more independent variables (often called 'predictors', 'covariates', or 'features').Regression
analysis is a statistical method that helps us to analyse and understand the relationship between
two or more variables of interest. Regression analysis is used for prediction and forecasting.

Dependent Variable: This is the variable that we are trying to understand or forecast.

Independent Variable: These are factors that influence the analysis or target variable and
provide us with information regarding the relationship of the variables with the target variable.

Differences between correlation and regression

• Regression establishes how x causes y to change, and the results will change
if x and y are swapped. With correlation, x and y are variables that can be
interchanged and get the same result.
• Correlation is a single statistic, or data point, whereas regression is the entire equation
with all of the data points that are represented with a line.
• Correlation shows the relationship between the two variables, while regression allows
us to see how one affects the other.
• The data shown with regression establishes a cause and effect, when one changes, so
does the other, and not always in the same direction. With correlation, the variables
move together.

Similarities between correlation and regression

• Both work to quantify the direction and strength of the relationship between two
numeric variables.
• Any time the correlation is negative, the regression slope (line within the graph) will
also be negative.
• Any time the correlation is positive, the regression slope (line within the graph) will be
positive.

Linear regression

In statistics, linear regression is a linear approach to modelling the relationship between a


scalar response and one or more explanatory variables (also known as dependent and
independent variables).
Regression line

A regression line is a straight line that de- scribes how a response variable y changes as an
explanatory variable x change. We often use a regression line to predict the value of y for a
given value of x.

Standard error of the estimate

The standard error of the regression (S), also known as the standard error of the estimate,
represents the average distance that the observed values fall from the regression line.
Conveniently, it tells you how wrong the regression model is on average using the units
of the response variable.
MODULE IV
TIME SERIES
A time series is a sequence of data points that occur in successive order over some period
of time. Time series analysis is a statistical technique that deals with time series data, or
trend analysis. Time series data means that data is in a series of particular time periods or
intervals.
Importance (utility) of Time series analysis
The analysis of time series is important not only to businessmen and economist, but also to
scientists, biologists etc, because of the following reasons.
1) Time series analysis disclose changes in time and changes it he values of the
variable.
2) It helps in understanding past behaviour: By observing past data, one can
understand the past behaviour of the variable under study.
3) It helps in predicting.
4) It helps in evaluating current programmes
5) It facilitates comparison.
Components of time series.
1. Secular trend.:
The word trend means 'tendency'. So, secular trend is that component of the time
series which gives the general tendency of the data for a long period. It is smooth, regular
and long-term movement of a series. Eg; Growth of population in a locality over decades.
2. Seasonal variations:
Seasonal variation is variation in a time series within one year that is repeated more or less
regularly. Seasonal variation may be caused by the temperature, rainfall, public holidays,
cycles of seasons or holidays.
3. Cyclical fluctuations.
The cyclical component of a time series refers to (regular or periodic) fluctuations around
the trend, excluding the irregular component, revealing a succession of phases of expansion
and contraction.
4. Irregular variations.
Irregular variations or random variations constitute one of four components of a time
series. They correspond to the movements that appear irregularly and generally during short
periods. Irregular variations do not follow a particular model and are not predictable.
Method of least square
Least Square is the method for finding the best fit of a set of data points. It minimizes the sum
of the residuals of points from the plotted curve. It gives the trend line of best fit to a time
series data.
MODULE V
INDEX NUMBERS
Meaning
An index number is a statistical device for measuring changes in the magnitude of a group of
related variables. It represents the general trend of diverging ratios, from which it is calculated.
It is a measure of the average change in a group of related variables over two different
situations.
Definition
Index number in statistics is the measurement of change in a variable or variables across a
determined period. It will show general relative change and not a directly measurable figure.
An index number is expressed in percentage form.
Characteristics of index numbers
1. Index numbers are specialised averages.
2. Index numbers are expressed in percentages.
3. Index numbers measure the change in the level of a phenomenon in one
single figure.
4. Index numbers measure changes not capable of direct measurement.
5. Index numbers are meant for comparison. They study changes in the
level of a phenomenon for a period of time as compared with its level
for another period of time.
6. Index numbers measure the effect of change from one time to another,
from one place to another etc.

Uses of Index Numbers


The various uses of index numbers are:
Economic Parameters
The Index Numbers are one of the most useful devices to know the pulse of the economy.
It is used as an indicator of inflationary or deflationary tendencies.
Measures Trends
Index numbers are widely used for measuring relative changes over successive periods of time.
This enable us to determine the general tendency. For example, changes in levels of prices,
population, production etc. over a period of time are analysed.

Useful for comparison

The index numbers are given in percentages. So it is useful for comparison and easy to
understand the changes between two points of time.
Help in framing suitable policies
Index numbers are more useful to frame economic and business policies. For example,
consumer price index numbers are useful in fixing dearness allowance to the employees.
Useful in deflating
Price index numbers are used for connecting the original data for changes in prices. The price
index is used to determine the purchasing power of monetary unit.
Compares standard of living
Cost of living index of different periods and of different places will help us to compare the
standard of living of the people. This enables the government to take suitable welfare measures.
Special type of average
All the basic ideas of averages are employed for the construction of index numbers. In averages,
the data are homogeneous (in the same units) but in index number, we average the variables
which have different units of measurements. Hence, it is a special type of average.
Methods of Construction of Index Number:
In constructing an index number, the following steps should be noted:
1. Purpose of the Index Number:
Before constructing an index number, it should be decided the purpose for which it is needed.
An index number constructed for one category or purpose cannot be used for others. A cost of
living index of working classes cannot be used for farmers because the items entering into their
consumption will be different.

2. Selection of Commodities:
Commodities to be selected depend upon the purpose or objective of the index number to be
constructed. But the number of commodities should neither be too large nor too small.
Moreover, commodities to be selected must be broadly representative of the group of
commodities. They should also be comparable in the sense that standard or graded items should
be taken.

3. Selection of Prices:
The next step is to select the prices of these commodities. For this purpose, care should be
taken to select prices from representative persons, places or journals or other sources. But they
must be reliable.

4. Selection of an Average:
Since index numbers are averages, the problem is how to select an appropriate average. The
two important averages are the arithmetic mean and geometric mean. The arithmetic mean is
the simpler of the two. But geometric mean is more accurate. However, the average prices
should be reduced to price relatives (percentages) either on the basis of the fixed base method
or the chain base method.

5. Selection of Weights:
While constructing an index number due weightage or importance should be given to the
various commodities. Commodities which are more important in the consumption of
consumers should be given higher weightage than other commodities. The weights are
determined with reference to the relative amounts of income spent on commodities by
consumers. Weights may be given in terms of value or quantity.

6. Selection of the Base Period:


The selection of the base period is the most important step in the construction of an index
number. It is a period against which comparisons are made. The base period should be normal
and free from any unusual events such as war, famine, earthquake, drought, boom, etc. It should
not be either very recent or remote.

7. Selection of Formula:
A number of formulas have been devised to construct an index number. But the selection of an
appropriate formula depends upon the availability of data and purpose of the index number.
No single formula may be used for all types of index numbers.

Weighted and unweighted index numbers

An unweighted index gives equal allocation to all securities within the index. A weighted
index gives more weight to certain securities, typically based on market capitalization.
One index type isn't necessarily better than another, they are just showing data in different
ways.
Kinds of Index numbers
Index number may be classified in terms of what they measure. In economic and business
the classifications are:
(1) price;
(2) quantity;
(3) value; and
(4) special purpose.
Tests of Adequacy of Index Number
1. Unit Test.
2. Time Reversal Test.
3. Factor Reversal Test.
4. Circular Test
Chain index number
A chain index is an index number in which the value of any given period is related to the value
of its immediately preceding period (resulting in an index for the given period expressed
against the preceding period = 100).
Bases shifting
For a variety of reasons, it frequently becomes necessary to change the reference base of
an index number series from one time to another without returning to the original raw data and
recomputing the entire series. This change of reference base period is usually referred to as
“shifting the base”.
Splicing
The process of combining two or more index numbers covering different bases into a single
series is called splicing.
Deflating
The process of adjusting a series of salary or wages or income according to current price
changes to find out the level of real salary wages or income is called deflating of index
numbers. It is necessary when price level is increasing and cost of living is also increasing.
Problems in the construction of index numbers
There are many difficulties faced in the construction of index numbers. They are
discussed as under:

1. Difficulties in the Selection of the Base Period


The first difficulty relates to the selection of the base year. The base year should be normal.
But it is difficult to determine a truly normal year. Moreover, what may be the normal year
today may become an abnormal year after some period. Therefore, it is not advisable to have
the same year as the base period for a number of years.
2. Difficulties in the Selection of Commodities:
The selection of representative commodities for the index number is another difficulty. The
choice of representative commodities is not an easy matter. They have to be selected from a
wide range of commodities which the majority of people consume. Again, what were
representative commodities some ten years ago may not be representative today. The
consumption pattern of consumers might change and thereby make the index number useless.
So the choice of representative commodities presents real difficulties.

3. Difficulties in the Collection of Prices:


Another difficulty is that of collecting adequate and accurate prices. It is often not possible to
get them from the same source or place. Further, the problem of choice between wholesale and
retail prices arises. There are many variations in the retail prices. Therefore, index numbers are
based on wholesale prices.

4. Arbitrary Assigning of Weights:


In calculating weighted price index, a number of difficulties arise. The problem is to give
different weights to commodities. The selection of higher weight for one commodity and a
lower weight for another is simply arbitrary. There is no set rule and it entirely depends on the
investigator. Moreover, the same commodity may have different importance for different
consumers. The importance of commodities also changes with the change in the tastes and
incomes of consumers and also with the passage of time. Therefore, weights are to be revised
from time to time and not fixed arbitrarily.
5. Difficulty of Selecting the Method of Averaging:
Another difficulty is to select an appropriate method of calculating averages. There are a
number of methods which can be used for this purpose. But all methods give different results
from one another. It is, therefore, difficult to decide which method to choose.

6. Difficulties Arising from Changes Overtime:


In the present times, changes in the nature of commodities are taking place continuously
overtime due to technological changes. As a result, new commodities are introduced and people
start consuming them in place of the old ones. Moreover, prices of commodities might also
change with technical changes. They may fall. But new commodities are not entered into the
list of commodities in preparing the index numbers. This makes the index numbers based on
old commodities unreal.

7. Not All Purpose:


An index number constructed for a particular purpose cannot be used for some other purpose.
For instance, a cost of living index number for industrial workers cannot be used to measure
the cost of living of agricultural workers. Thus there are no all-purpose index numbers.

8. International Comparisons not Possible:


International price comparisons are not possible with index numbers. The commodities
consumed and included in the construction of an index number differ from country to country.
For instance, meat, eggs, cars, and electrical appliance are included in the price index of
advanced countries whereas they are not included in that of backward countries. Similarly,
weights assigned to commodities are also different. Thus, international comparisons of index
numbers are not possible.

9. Comparisons of Different Places not Possible:


Even if different places within a country are taken, it is not possible to apply the same index
number to them. This is because of differences in the consumption habits of people. People
living in the northern region consume different commodities than those consumed by the
people in the south of India. It is, therefore, not right to apply the same index number to both.

10. Not Applicable to an Individual:


An index number is not applicable to an individual belonging to a group for which it is
constructed. If an index number shows a rise in the price level, an individual may not be
affected by it. This is because an index number reflects averages.

You might also like