You are on page 1of 27

BUSINESS STATISTICS THEORY-SEMESTER 2

CA PANKAJ GOEL

BUSINESS STATISTICS
THEORY-SEMESTER 2
BY CA PANKAJ GOEL

CA PANKAJ GOEL-9811860116 Page 1

CLASSES FOR BCOM/CA/CPT-ACCOUNTANCY/MATHEMATICS/STATISTICS/AUDITING


Measures of Central Tendency
Ques 1. Which type of Average should be used in following cases:
i) Size of shoes sold in large no: in a shop.
ii) Marks obtained in an exam.
iii) Average change in Cost of living of workers.
iv) When distribution has open- ended classes and wide variations. (B.Com (P) 99)
v) When Quantities are in ratio.
vi) A stockiest of readymade garments (B.Com (P) 2001)
vii) When average deprecation rate is to be calculated and depreciation is charged as
per W.D.V. (B.Com (P) 99)
viii) When speed is changing but distance is constant and average speed is to be
calculated. (B.Com (P) 99)
Ans. i) Mode ii) Mean iii) Geometric Mean iv) Mode or Median
v) Geometric Mean vi) Mode vii) Geometric Mean viii) Harmonic Mean
Ques 4. Fill in the blanks
a) Sum of deviation from mean is __________________
b) Absolute Sum of deviation is minimum from _______________
c) The Geometric Mean of a set of values lies between Arithmetic Mean and
__________________
d) Median is same as ________________ quartile.
e) Mean & Mode are called______________ averages.
f) A distribution may have many__________.
Ans. a) Zero b) Media c) Harmonic Mean X  GM  HM
d) Second Quartiles Q2 because it covers 50% of item to its right and to its left.
e) Positional average f) Mode (like in Bimodal Series)
Comparison Between Different types of Averages
Mathematical Average Positional Average
Basis Mean GM HM Median Mode
1) Meanin It is the figure It is nth root of It is reciprocal It is value of It is the
g obtained by product of n of the middle item of value
dividing total item of a arithmetic a series Which has
values of series average of the arranged in the
various item by reciprocals of ascending or
greatest
their number. the values of descending
its various order of frequency
items. magnitude classify in
its
immediate
neighborh
ood
2) Algebra It is capable of Same Not capable Not capable No
ic algebraic of algebraic of algebraic
Manipul manipulation i.e. manipulation manipulation
ations we can calculate
Combined Mean

3) Weight Weighted AM Weighted Weighted No No


ed Geometric Harmonic
ΣWX
Averag = Mean = AL Mean
e ΣW
 W log X  ΣW
Σ  =
 ΣW  W
Σ 
X
4) Propert 1) Sum of 1) In case of Absolute sum
ies deviation GM, the ___________ of deviation ________
from mean is product of From median __
zero i.e. item is minimum.
 
Σ X- X 0 remains
unchanged
2) Sum of if each item
square of is replaced
deviation by GM
from mean is 2) In case of
minimum i.e. GM,
 2
Σ X - X is product of
correspond
min ing ratio on
either side
is always
equal.
5) Applica Indeterminable Indetermin- Indetermin- Determinable Determina
ble to able able ble
Open
ended
Classe
s
6) Effect Yes Yes Yes No No
of (greatest) (less than AM) (less than
extrem GM)
e items
Measures of Dispersion
Ques 1. a) Fill in the blank
a) The income of a person in a particular week is Rs 20 per day. The mean deviation of
his income for 7 days would be __________ (B.Com (H) 89)
b) Absolute Sum of deviation is minimum from _____________ (B.Com (H) 95)
c) In any distribution, standard deviation is always ______________ the mean deviation.
d) All measure of relative dispersion are ____________ from unit employed
e) _________ method is most affected by extreme items while measuring dispersion by
this method.
f) Under Normal Curve X ± 3σ covers _________ area.
g) The Quartile deviation includes ___________ of items.
h) In normal distribution QD is equal to ___________ SD & MD is equal to ___________
SD.
i) The standard deviation, is ____________ measure of dispersion.
Answers
a) As income for each day is Rs 20, so sum of deviation from X i.e. 7 will be zero, so even
mean deviation will also, be zero.
b) Median i.e. | X – median | is min
c) greater or equal to i.e. ≥
d) free
e) Range
f) 99.73%
g) Central 50%
h) QD = 2/3 σ & MD = 4/5 σ
or .6748 σ = .7979 σ
i) absolute
Ques 4. State Which of the following statement is correct.
i) Mean, standard deviation and variance have the same unit.
ii) For calculating standard deviation, deviation can be taken from median also.
iii) Coefficient of variation is expressed in same units as original class.

  
iv) Since Σ X - X  0 , then Σ X - X 2 is also zero.
v) The standard deviation of a set of 50 items is 6.5. If every item is increased by 5,
standard deviation is 11.5 (CA (P.E. -1) 98 N, 99 N)
vi) If standard deviation of a set of 50 item is 8. Then standard will be 16 if each item is
multiplied by 2 (CA ( P.E. -1) 96M)
Answer
i) Incorrect, Mean & standard deviation have same unit but variance being square of standard
deviation has different unit.
ii) Incorrect, For calculating SD, deviation can be taken only from mean.
iii) Incorrect C.V. is always expressed in percentage

iv) Incorrect Σ X - X 2  Nσ2 , & is zero only when all y’s are equal.
v) Incorrect standard deviation remains unaffected by adding or subtracting a constant item.
vi) Correct.
Short Answer Questions
Ques 1. What is the difference between Mean deviation (Average Deviation) & Standard
deviation? (B.Com (H) 89, 93, 2001, B.Com (P) 2000)
Ans. Difference between mean deviation & Standard deviation can be discussed as follows:
Basis Mean deviation Standard deviation
a) Meaning It is the arithmetic mean of It can be calculated as square root
the absolute deviations of of arithmetic mean of square of
individual values from deviation from arithmetic mean.
average of given data.
b) Basis of calculation Mean deviation can be It can be calculated only from
calculated from either mean, arithmetic mean only so, mean is
median or mode i.e. either of the basis of its computation.
3 average are necessary for
its calculation
It doesn’t ignore the algebraic
It ignores algebraic signs by
c) Algebraic Sign
taking modulus. signs as here deviations are
squared.
d) Algebraic Manipulation It is not capable of algebraic
It is capable of Algebraic
manipulation i.e. we cannot
calculate combined MD. Manipulation as we can calculate
Combined SD.
Ques 2. Why Standard deviation is considered as best measure of dispersion?
(B.Com (H) 80, 88, 89, 93, B.Com (P) 95, 98)
Ans. Standard deviation is square root of arithmetic mean of square of deviation from Arithmetic
Mean i.e.

SD (σ) =

Σ X-X 2
N
Standard deviation is the only measure of dispersion which satisfies most of the properties of
good measure of dispersion & due to which it is considered as best measure of dispersion.
It rectifies the short comings of the other measure of dispersion & to prove this, it is compared
with other measure of dispersion as done below.
a) Range & Quartile deviation are not based on each every item in the series because range
only taken into account largest & smallest item whereas Quartile deviation is concerned with
only first & third Quartile whereas standard deviation is based on each & every item of series.
b) Mean deviation ignore (+) & (-) signs which is not sound mathematically, & therefore in
standard deviation we square the deviation from mean & in this way it satisfies important
property of mean that sum of square of deviation from mean is minimum.
Besides above, it also satisfies several other properties which are discussed in Ques - 3.
Ques 3. Explain important properties of standard deviation.
(B.Com (H) 2002, B.Com (P) 2000, C.A. (P.E.-1) 97 M)
Ans. Important properties of standard deviation can be discussed as follows:
i) The value of standard deviation is independent of change of origin but not of change
of scale. i.e. its value remain same if each item of series is increase or decreased by
constant say K but its value will be affected similarly if each item is multiplied or divided
by a constant K.
ii) It is capable of algebraic manipulation i.e. we can calculate combined standard deviation
& variance.
2
Variance = σ
&
Combined Standard deviation
2 2 2 2
n1d1  n2d2  n1σ1  n2σ 2
σ12 
n1  n2

d1 = X1 – X 12
d2 = X 2 – X 12
iii) For a symmetrical distribution
X ± 1 σ, Covers 68.27% of items
X ± 2 σ, Covers 95.45% of items
X ± 3 σ, Covers 99.73% of items
iv) The standard deviation of ‘n’ natural numbers will be

σ
1
12
N - 1
2

Where N = no. of items.


SKEWNESS
Ques 1. Identify the Correct Statements
a) Both Skewness & Kurtosis are indicative of the shape of the distribution.
b) For any symmetrical distribution
1/3 (Mean – Mode) = Mean – Median
c) In a highly skewed distribution, value of mean varies a great deal from that of
median.
d) Two distributions with the same mean, standard deviation & coefficient of
Skewness, have same peakedness.
e) β2 must always be positive.
f) For symmetrical distribution, coefficient Skewness is zero.
g) Variance is equal to second Central Moment.
h) Arithmetic Mean is equal to first moment about origin.
i) Series representing U-shape curve is symmetrical. (B.Com (P) 2001)
Answer a) Correct
b) Incorrect, in a symmetrical distribution mode = mean = median.
c) Correct
d) Incorrect because peakedness concept is related to Kurtosis which is
measured by value of β2.
e) Incorrect, in can be negative in case of platykurtic distribution.
f) Correct
g) Correct
h) Correct
i) Incorrect because in such a series mean ≠ mode ≠ median
Ques 3. Fill in the blanks.
1) If mean & mode of a given distribution are equal, then its coefficient of Skewness
is ________________.
2) Skewness is positive when mean _______________ mode.
3) In asymmetrical distribution, the distance between the _________ & the
__________ is about ______________ of distance between the ____________ and
the __________.
4) In case of symmetrical distribution, β1 = _______________.
5) In case of symmetrical distribution, quartiles are at equal distance from
__________.
Answer i) zero ii) Greater than iii) Mean, Median, 1/3, of mean & mode
iv) zero v) Median i.e. Q3 – Median = Median – Q1
Short Answer Question
Ques 1. Differentiate between Dispersion & Skewness?
(B.Com. (H) 81, 2000, B.Com. (P) 2000)
Ans. The difference between Dispersion & Skewness can be summed up in the following table:
Difference between Dispersion & Skewness
Basis Dispersion Skewness
1) Meaning It is a measure of the extent When a distribution is not symmetrical, it is
variation in the individual said to be Skewed i.e. absence of
items symmetry denotes presence of Skewness.
e.g. The following three figures would give an
Series X Series Y idea of an absence or presence of
Skewness.
6 4
a) perfectly Symmetry Curve (Lack of
2 4
(Skewness)
4 4
X = Median = Mode
X =4 4
b) Asymmetrical Curve
but different variation in data
i) Right Tail (Positively Skewed)
or series y is more consistent
as compared to series X. Mode < Median < X
ii) Left Tail (Negatively Skewed)
X < Median < Mode
2) Purpose
Its purpose is to identity It identifies the direction of the variation,
amount of variation. the extent to which they depart from
symmetry.
3) What it deals
with? It deals with variability in
general & spread of values It deals with symmetry of distribution of
around central value. central value & with nature of variation on
either side of central value.
4) Methods of
Measuring Dispersion can be measured
by Range, QD, MD, SD Skewness can be measured by
→ Bowley Coeff of Skewness
→ Kelly Coefficient Skewness
→ Moments
→ Karl peasson Method.
Ques 2. What do you mean by Kurtosis? What purpose it serves?
(B.Com (H) 91, 95)
Ans. Kurtosis is one of the measure which tells us about the form of distribution. It tells us
whether the distribution if plotted on a graph paper would give us normal curve or a curve more
peaked or more flat than Normal Curve.
Basically “word kurtosis is a greek term which means ‘bulginess.”
Measure of Kurtosis
Kurtosis measured by Coefficient β2 or its derivative
i.e. γ 2 .
& μ4 γ 2  β2  3
β2 
2
μ2

If β2 > 3, & γ 2 is positive, distribution is Lepto Kurtic

If β2 = 3, & γ 2 = 0, distribution is Meso Kutric

If β2 < 3, & γ 2 is negative, distribution is Platy Kurtic


Ques 3. Differentiate between Skewness & Kurtosis? (B.Com (H) 99)
Basis Skewness Kurtosis
1) Meaning When a distribution is not It refers to the degree of peakedness of
Symmetric, it is said to be said to be the hump of the distribution.
skewed i.e. absence of symmetry
which denotes the presence of
Skewness.
The following three figures would
give an idea of an absence or
presence of Skewness.
2) Purpose When we compare two or more
It reveals whether distribution is symmetrical distribution, the difference in
Symmetric or not i.e. it reveals the heights of symmetrical curves will be
absence or presence of symmetry. called Kurtosis.

Note: - Besides above points, you can also write formulas, & for formulae see end of the chapter.
Ques 4. Explain in brief Sheppard Correction method applied to moments?
(B.Com (H) 2002)
Ans. As per W.F. Sheppard, the effect due to grouping at mid points of intervals can be corrected
by formula given below:
2
h
μ2 (Corrected) = μ2 (Uncorrected) 
12
μ3 (Corrected) = μ3 (Corrected)

1 2 7 4
μ 4 (Corrected) = μ 4 (Uncorrected) =  h μ2  h
2 240
Where h is the width of Class Interval.
Purpose of Moments
i) The first central moment is always zero i.e. μ1 = 0.
2
ii) The Second Central moment about mean indicate variance i.e. μ2 = σ
iii) The first moment about origin (zero), indicates Arithmetic Mean i.e.
1
V1 = A + μ1
iv) The third & second central moment is used to measure Skewness.
2
μ3
β1 
3
μ2
or as per Fisher
μ3
γ1  β1 
μ2 μ2

v) The fourth & second central moment in used to measure Kurtosis


μ4
β2 
2
μ2

or as per Fisher, γ 2  β2  3
Statistical Decision Theory
Ques 1. What do you mean by Statistical decision theory. (B.Com (H) 99)
Ans. In simple words, statistical decision theory can be defined as theory which deals with
decision-making using statistical tools & such theory can be applied only when there are several
alternatives for a particular objective.
Ques 2. Fill in the blanks (CA (P.E. -1) 93 Dec.)
1) E.M.V.& EOL criterion _____________ the knowledge of probabilities of states of
nature.
2) Maximax & Minimax criterion _____________ the knowledge of probabilities of status
of natures.
3) While making decision using EMV criterion, out of several EMV for several action, EMV
with _______________ value is taken.
4) While making decision using EOL criterion, out of several EOL for several action EOL
with ____________ is taken.
5) Under Laplace criterion ________________ probability are assigned to various states
of nature.
6) Events beyond the control of the decision maker called _____________ or
_____________ of nature.
7) The maximum amount that a retailer will be willing to pay for a perfect predictor is
called the _____________.
8) There are two types of losses in a stocking operation: ___________ losses and
_________ losses.
9) The pleasure or displeasure one receives from certain outcomes is one’s
_____________.
Answer
1) Require 2) does not require 3) maximum value
4) Minimum value 5) equal 6) outcomes, states
7) EVPI 8) obsolescence, opportunity 9) utility
Ques 3. State which of the following statement are correct or incorrect.
i) A person can have one utility for one situation and quite a different one of the next
situation.
ii) It is always difficult to make use of other people’s knowledge about a situation
without explaining statistical techniques to them.
iii) With perfect information, a retailer would consistently make the maximum profit
possible.
iv) One advantage of using decision trees is that every outcome, desirable or
undesirable, must be investigated.
v) On a decision tree, a circle represents a decision point.
vi) If a retailer can earn $100 per day with perfect information, then EVPI = $100.
vii) A businessman with a linear utility curve can effectively use expected monetary
value as his decision criterion.
viii) A decision that maximizes expected profits will also minimize expected losses.
Answer
1) Correct 2) Incorrect 3) Correct 4) Correct
5) Incorrect 6) Incorrect 7) Correct 8) Correct
Short Answer Questions
Ques 1. Define following terms (B.Com (H) 2000) (CA (P.E. -1) 94 J, 97 N)
a) EVPI
b) EOL
c) EPPI
d) EMV
e) Actions
f) States of nature
Ans. a) EVPI: → The expected value of perfect information is the maximum amount of money a
decision maker can spend to get additional information about the states of nature.
EVPI = EPPI – EMV
b) EOL: → The difference between profit actually derived from a certain decision and that
would have been derived if decision had been the best one for the event actually occurred, it is
known as opportunity loss.
The expected opportunity loss is the expected less incurred because of failure to take a specific
action & it is derived from loss table.
EOL = EVPI
c) EPPI: → The expected pay off of perfect information is the maximum expected profit
decision maker can make if perfect predictor is available & thus all the information about the
states of nature is available to you.
d) EMV: → The expected monetary value reveals expected profit decision maker can hope
to make on the basis of available information about status of nature.
EMV = Σpi Xi

Xi → Pay off for each action for several state of nature.

Pi → Probabilities of several states of nature.


e) Actions: → To make any decision, several alternatives are available to a decision
maker, all such relevant alternatives are termed as action in the statistical decision theory.
f) States of Nature: → These are those possible events which are uncertain but are vital
for a choice of any one of the alternative course of action & Therefore such events are considered
while making decision.
Ques 2. Differentiate between pay off Table & Regret (Loss) Table?
(B.Com (H) 96, 97) (CA (P.E. -1) 94 N)
Ans. Difference between pay off Table & Loss Table can be discussed in the following table.
Basis Pay off Table Loss Table
(1) Meaning It is a table which reveals the It is a table which reveals the difference
values of actual pay off i.e. value between profit actually derived from a
of a consequences expressed in certain decision & that would have been
terms of gain which is expressed is derived if the decision had been the best
money terms. one for event actually occurred.

(2) Criterion Here EMV criterion is used for the Here EOL is the criterion used for the
purpose of selecting best course of purpose of selecting best course of
action. action.
Correlation
Ques 1. Fill in the blanks
i) Value of Correlation Coefficient lies between ___________ & ______________
ii) Value of Correlation Coefficient is independent of ____________ & ___________
iii) Co variation implies that two variables should vary in the ______________
direction.
iv) When value of r = 1, Correlation is ____________ & when value of r = -1, Correlation
is ____________ & when r = 0 _____________
v) The Coefficient of Concurrent deviation can have both ___________ &
____________ values.
vi) Rank Correlation can be applied in ______________ data.
Answer
i) – 1 & 1 –1≤r≤1
ii) Change of Scale & Change of Origin
iii) Same
iv) Positive, negative, no correlation i.e. variables are independent.
v) Positive & negative.
vi) Qualitative.
Ques 2. Identify the Correct Statement.
i) Correlation always reveal cause and effect relationship.
ii) Coefficient of correlation is a relative measure of relation between two or more
variables
iii) The coefficient of correlation have both positive & negative values
iv) In a Scatter diagram the independent variable is shown on X axis & dependent
variable on Y-axis
2
v) If r = 0, this implies there is no association between the variables
vi) Coefficient of correlation must be in the same units as original data
Answer
i) In correct, Not always because we may have chance Correlation [See Long Answer
Question]
ii) Correct iii) Correct iv) Correct v) Correct
vi) Incorrect, it has no units because it is a relative measure
Ques 3. State nature of the following Correlation. (B.Com (H) 2000)
i) Sale of Woolen garments & change in temperature
ii) Rainfall & Crop Yield
iii) Colour of Shirt & Weight of person wearing it
iv) Production of wheat & use of fertilizers
v) Age of applicant for Life insurance & premium of insurance
Answer
i) Negative ii) Positive iii) No iv) Positive v) Negative
Explain in brief the properties of Coefficient of Correlation.
Ans. → The Properties Coefficient of Correlation can be discussed as follows:
i) Its values lies between – 1 & 1 i.e. – 1 ≤ r ≤ 1
ii) It is independent of Change of scale & Change of origin i.e. its value remains unaffected
even if each value of data is increased, decreased, multiplied or divided by same
number.
iii) It is a pure number & is independent of the unit of measurement.
Example. Comment on the following statements:
(a) “If the coefficient of correlation between two variable is + .5 it means 50% of variation are
explained.”
st
(b) “If the coefficient of correlation between two variables of the 1 series is + .2 and between
two variable of other series is + .4 it means the degree of relationship in second series is
st
twice as compared to that of the 1 series.”
Solution:
(a) Coefficient of determination explains the degree of relationship between two variables.
2 2
Therefore, if r = .5 then explained variation is r , i.e., (. 5) or 25%. Therefore, explained
variation is not 50%, but 25%.
st 2
(b) Explained variation in the 1 series is (. 2) 4% and Explained variation in second series is (.
2
4) 16%. Therefore, degree relationship is ‘Not Twice’ but ‘Four Times’ in the second series.
a) Probable Error (P.E.) → It should be used for inter pretation only when N is very large
otherwise may give misleading conclusion.
2
1 r
P.E. = .6745
n
Where: r = coefficient of correlation
n = number of items.
Interpretation of r on the bases of Probable error can be expressed as follows:
(a) if r > 6PEr  r is significant

(b) if r < 6PEr  r is insignificant.


Ques 1. What do you mean by Correlation & what are its various types? Does it always
imply cause and effect relationship?
Ans. “Correlation analysis attempts to determine the degree of relationship between
variables”
Ya Lun Chow
→ Thus following important elements of Correlation can be identified on the basis of above
definition.
i) There should be two or more variables.
ii) There should be some relationship between them.
iii) The Change in value of one may affect another also.
TYPES OF CORRELATION
Correlation can be:
(i) Positive or Negative;
(ii) Simple, Multiple or Partial;
(iii) Linear or Non-linear.
(i) Positive and Negative Correlation
Correlation can be either Positive or negative. When the values of two variables move in the
same direction i.e. when an increase in the value of one variable is associated with an increase in
the value of other variable, and a decrease in the value of one variable is associated with the
decrease in the value of the other variable, correlation is to be positive.
If, on the other hand, the value of two variables move in opposite directions, so that with an
increase in the values of one variable the value of the other variable decrease, and with a
decrease in the values of one variable the values of the other variable increase, correlation is said
to be negative.
Thus generally price and supply are positively correlated. When prices go up supply also
increases and with the fall in prices supply also decreases. The correlation between price and
demand is generally negative. With an increase in price the demand goes down and with a
decrease in price the demand generally goes up. Demand curve is downward sloping where as
supply curve is upward sloping.
Negative Correlation Positive Correlation
Price Demand Price Supply
10 100 10 10
12 80 18 20
(ii) Simple, Multiple and Partial Correlation
In simple correlation we study only two variables – say price and demand. In multiple correlation
we study together the relationship between three or more factors like production, rainfall and use
of fertilizers. In partial correlation though more than two factors are involved but correlation is
studied only between two factors and the other factors are assumed to be constant.
(iii) Linear and Non-linear (Curvi-linear) Correlation (C.A. (P.E. -1) 92 N)
The correlation between two variables is said to be linear if corresponding to a unit change in the
value of one variable there is a constant change in the value of the other variable i.e. incase of
linear correlation, the relation between the variables X and Y is of the type
y = a + bx.
In such cases, the values of the variables are in constant ratio. The correlation between two
variables is said to be non-linear or curvilinear if corresponding to a unit change in the value of
one variable the other variable does not change at a constant rate e.g.
linear correlation
X Y Ratio between
X and Y
10 5 2: 1
20 10 2: 1
30 15 2: 1

Non-linear correlation
X Y Ratio between X and Y
10 4 10: 4 or 5: 2
20 10 20: 10 or 2: 1
30 12 30: 12 or 5:2
CORRELATION – CAUSE AND EFFECT RELATIONSHIP:
Though the word correlation is used in the sense of mutual dependence of two or more variables
yet it is not at all necessary that it should always be so. Even a very high degree of correlation
between two variables does not necessarily indicate a cause and effect relationship
between them. There can be correlation between two variables due to any one or more of the
following reasons:
(1) Both the correlated variables are being affected by a third variable or by more than one
variable. For example we may find a high degree of correlation between the in reality it may
be found that tie is due to good fertilizer etc.
(2) Related variables might be mutually affecting each other so that neither of them could be
designated as a curve or effect. This situation particularly holds good in the field of
economics and business. For example the demand of a commodity may go down as a result
of rise in prices. One would normally presume that price is the cause and demand is the
effect. However it may be, that the demand of the commodity has gone up due to anticipated
shortage in future and has resulted in the price rise. Now demand would be the cause and
price would be the effect.
(3) The correlation may be due to random or chance factors. Many times correlation is noticed
between two variables without any real relationship between them. It may happen due to
chance. This generally happens when a very small sample is chosen from a large universe.
(B.Com (p) 97)
Income (Rs) Weight (Kg.)
200 40 Kg.
300 50 Kg.
400 60 Kg.
The above points make it clear that correlation is only a mathematical relationship and it does not
necessarily signify a cause and effect relationship between the variables.
Regression
Ques 1. Define Regression
Ans. Regression analysis attempts to establish the nature of the relationship between variables
i.e. to study the functional relationship between the variables & there by help in prediction.
Ques 2. What do you mean by Standard Error of Estimate
Ans. Standard Error of Estimate help in finding the likely error in estimated values of Y or X.

Σ Y - Yc
2
Sy 
N

Σ X - Xc
2
Sx 
N
Where Sy → Standard error of the estimate of y values
Sx → Standard error of the estimate mode x values
Xe → estimated values of x
Ye → estimated values of y
Y → original values of y
X → original values of x
Ques 3. Fill in the blanks
a) The regression lines cut each other at the point of ____________
b) The regression lines of Y on X ____________ the total of square of horizontal
deviations
c) If r = +1 or -1, those will be only ______________ regression line
d) The greater the distance at which regression lines cut each other i.e. greater the
angle formed at their point of intersection, the degree of correlation will be
_____________
e) When r = 0, the regression lines cut each other at angle of _____________
f) When one of the regression coefficient is greater than one, other will be
____________ less than one
g) When one regression coefficient is negative, other would be _________
h) Lines of regression are ____________ if r = 0 & they are ____________ if r = ± 1
i) The purpose of regression is to study ___________ between variables
j) The sign of regression coefficient is ____________ as that of correlation coefficient
k) Regression coefficient is independent of change of ___________ but not change of
__________
l) An association between two variables that is described by a curved line is a
___________ one.
m) Every straight line has a _____________, which represents how much each change
of the independent variable changes the dependent variable.
n) The extent to which observed values differ from their predicted values on the
regression line is measured by the _________________.
o) ________________ is a measured of the proportion of variation in the dependent
variable that is explained by the regression line.
Answers
a) Average of X & Y b) Minimizes c) One d) Less
e) 90 f) Less g) Negative h) Separate,
Same
i) Dependence j) Same k) Origin, scale l) Curvilinear
m) Slope n) Standard Error of Estimate
o) Coefficient of determination
Ques 4. State which of the following statement are ‘Correct or Incorrect’.
1. Regression analysis is used to described how well an estimating equation
describes the relationship being studied.
2. Given that the equation for a line is Y = 26 – 24X, we may say that the relationship
of Y to X is direct linear.
2
3. An r value close to 0 indicates a strong correlation between X and Y.
4. Regression and correlation analyses are used to determine cause and effect
relationships.
2
5. The sample coefficient of correlation, r, is nothing more than r , and we cannot
interpret its meaning directly as a percentage of some kind.
6. The standard error of estimate measures the variability of the observed values
around the regression equation.
7. The regression line is derived from a sample and not the entire population.
8. We may interpret the sample coefficient of determination as the amount of the
variation in Y that is explained by the regression line.
9. Lines drawn on either side of the regression line at ± 1, ± 2 and ± 3 times the value
of the Standard error of estimate are called confidence lines.
10. The estimating equation is valid over only the same range as that given by the
original sample data upon which it was developed.
11. In the equation y = a + bx for dependent variable y and independent variable X, the
Y is intercept is b.
12. If a line is fitted to a set of points by the method of least squares, the individual
positive and negative errors from the line sum to zero.
13. If S e = 0 for an estimating equation, it must perfectly estimate the dependent
variable at the observed points.
14. Suppose the slope of an estimating equation is positive. Then the value of r must
2
be the positive square root of r .
15. If r = .8, then the regression equation explains 80 percent of the total variation in
the dependent variable.
16. The coefficient of correlation explains the percentage of the total variation of the
dependent variable.
17. The standard error of estimates is measured perpendicularly from the regression
line rather than on the Y-axis.
Answers
1) Incorrect 2) Incorrect 3) Incorrect 4) Incorrect 5) Correct
6) Correct 7) Correct 8) Correct 9) Incorrect 10) Correct
11) Incorrect 12) Correct 13) Correct 14) Correct 15)
Incorrect
16) Incorrect 17) Correct
Different between Correlation & Regression?
Ans.
Basis Correlation Regression
1) Meaning “Correlation analysis attempts Regression analysis attempts to
to determine the degree of establish the nature of the relationship
relationship between between variable i.e. to study the
variables” functional relationship between the
Ya Lun Chow variables & there by help in prediction.
Thus following elements of
Correlation can be identified on
the basis of above definition.
i) There should be two or more
variables.
ii) There should be some
relationship between them.
iii) The change in the value of
one may affect another also.
2) Functional There y = f(x) or X = f(y) are not some
Relationship Here y= f(x) or X = f(y) is because regression analysis
irrelevant i.e. here cause a effect establishes functional relationship
relationship cannot be studied. between the variables.

3) Change of Regression coefficient are


scale & Correlation coefficient is independent of change of origin but
origin independent of change of scale of not change of scale
change of origin
It is an absolute measure of
4) Nature It is a relative measure showing relationship.
association between variables.
Ques 4. Explain important properties of Regression Coefficient?
(B.Com (H) 94, 98, C.A. (P.E. -1) 97 N, 99 N)
Ans. Important of Regression Coefficient are as follows:
i) Regression Coefficient are independent of Change of Origin but not Change of Scale.
ii) Coefficient of Correlation is GM between two regression coefficient i.e.
r=± byx . bxy
iii) Regression lines intersect at average values of X α Y
iv) Both regression coefficient will have same algebraic sign i.e. either both of them would be
positive or both of them would be negative.
v) If one of the regression coefficient is more than 1, other has to be less than 1 because
value of coefficient of correlation cannot exceed one i.e.
-1 ≤ r ≤ 1

Probability
Ques 1. Define following terms.
Mutually exclusive events
a) Independent events
b) Dependent events
c) Simple events
d) Compound events
e) Exhaustive Cases
f) Equally likely
g) Joint Probability
Ans. a) Mutually Exclusive Events. Two or more events are said to be mutually exclusive if the
happening of any one of them excludes the happening of any one of them excludes the
happening of all others in a single (i.e. same) experiment. Thus in the throw of a single dice the
event 5 and 6 are mutually exclusive because if the event 5 happens no other event is possible in
the same experiment. Here one and only one of the events can take place at a time excluding
others.
b) Independent Events. An event is said to be independent when occurrence of one does
not affect occurrence of other, events.
c) Dependent Events. An event is said to be dependent when occurrence of one affect
occurrence of other, dependent events.
d) Simple Events. Explanation in compound events.
e) Compound Events. An event is called Simple if it corresponds to a single possible
outcome. Thus in tossing a dice, the chance of getting 3 is a simple event (because 3 occurs in
the dice only once). However the chance of getting an odd number is a compound event
(because odd numbers are more than one i.e. 1, 3 and 5).
f) Exhaustive Cases. All possible outcomes of an event are known as exhaustive cases. In
the throw of a single dice the exhaustive cases are 6 as the dice has only six faces each marked
with a different number.
g) Equally Likely Cases. Two or more events are said to be equally likely if the chance of
their happening is equal I.e., there is no preference of any one event over the other. Thus in a
throw of an unbiased die, the coming up 1, 2, 3, 4, 5 or 6 is equally likely. In the throw of an
unbiased coin the coming up of head or tail is equally likely.
h) Joint Probability. → are arrived at by multiplying two or more probabilities depending on
no. of events involved.
i) Marginal probabilities → are sum of probabilities of two or more events.
Ques 1. Explain in brief (B. Com (H) 98, CA (P.E. – I) 94N, 95M, 95 N, 97M)
a) Addition Theorem of Probability
b) Multiplication Theorem of Probability
Ans. a) Addition Theorem
If A & B are two events, then probability of at Ieast one of them occurs is denoted by P (A
U B) & given by.
P (A  B) = P (A) + (B) – P (A  B)
Where P (A) → Probabilities of the occurrence of event A
P (B) → Probabilities of the occurrence of event B
P (A  B) → Probabilities of simultaneous occurrence of event A & B.
If events are Mutually Exclusives, then
P (A  B) = P (A) + P (B),
Because P (A  B) = 0
& in case of finite no. say n Mutually Exclusive Events
P (A1  A2 ----  An) = P (A1) + --- P (An)
a) Where events are not Mutually Exclusive
b) When events are Mutually Exclusive
b) Multiplication Theorem
The probabilities of the simultaneously occurrence of the events A & B is denoted by P
(AB) or P (A  B) & given by
a) If events are independent
P (A  B) = P (A) . P (B)
b) If events are dependent
B
P (A  B) = P (A) . P  
A
A
= P (B) P  
B
Ques 2. Explain in brief Conditional probability. (B.Com (H) 2000)
Ans. → Two events A & B are said to be dependent when B can occur only when A is known to
have occurred or vice versa. The probabilities associated with such events are called Conditional
Probabilities.
 A  P A  B 
P  =
B P B 

A
Where P   → Probabilities of occurrence of event A when B has occurred
B
 B  P B  A 
P  
A P A 

B
Where P   → Probabilities of occurrence of event B when A has occurred
A
Ques 5. States & illustrate Baye’s Theorem. (B.Com (H) 2001)
Ans. Bayes’, Theorem
This Theorem is based on revision of priori probabilities, it is basically an extension of conditional
probability.
Imaging a situation where two uncertain events (A) and (not A) are possible. Suppose we know
their probability i.e., we know the probability of A’s happening and also the probability of A’s not
happening. These probabilities are prior probabilities because they are probabilities before any
further information is available. Suppose an investigation is conducted. The investigation may
have several outcomes which would be dependent on event A. For any particular outcome (which
may be called B) the conditional probabilities P (B/A) and P (B or A) are available.
The result itself serves to revise the probabilities for event (A) and event (not A). The resulting
values would be the posterior probabilities since they have been obtained after the results of the
investigation.
Thus according to Bayes theorem the posterior probability of event (A) for a particular result of an
investigation (B) may be found from.
PA  PB/A 
P (A/B) =
P A  P B/A   P Not A  P B/Not A 
Theoretical Distribution
Properties of Normal Distribution
1. It is perfectly symmetrical about the mean ( μ ) and is bell shaped. This means that if we
fold the curve along the vertical line at the centre. The two halves of the curve would coincide.
2. Mean = Median = Mode.
3. It has one mode, it is unimodal.
4. The ordinate at the mean of distribution divides the total area under the normal curve into
two equal parts.
5. The following are the descriptive measures of the normal distribution:
Mean = X or μ (Standard form: X = 0)
2
Standard deviation = σ (Standard form: σ = 1).
2
Variance or μ2 = σ
Third central moment, μ2 = 0
4 4 2
Fourth central moment, μ = 3σ = 3μ2
μ3
Moment coefficient of skewness, β1  0
μ2 μ2

0 μ4
Moment coefficient of Kurtosis σ β2  3
2
μ2
Hence, it is a meso kurtic curve.
6. The normal curve is concave near the mean value, while near ± 3σ it is convex to the
horizontal axis. The points of inflexion, i.e., the points where the change in curvature occurs are ±
σ.
7. The quartiles Q1 and Q3 are equidistant from the median.
8. The mean deviation about mean is 4/5 σ or 0.7979σ.
9. The Standard deviation distributes the area under the normal curve as given below:
(i) Mean ± 1σ covers 68.268% area, 34.134% area will lie on either side of the mean.
(ii) Mean ± 2σ covers 95.45% area, 47.725% area will lie on either side of the mean.
(iii) Mean ± 3σ covers 99.73% area, 49.865% area will lie on the either side of the mean. Thus
mean ± 3σ covers nearly the whole area, leaving only .27% area.
Ques 1. What do you mean by Time series?
Ans. A time series consist of data which is arranged in a series chronologically.
Component of time series →
Long term Short term
Trend a) Seasonal variation
b) Cyclical
c) Irregular variation
EXAMPLES
(i) Recording of daily temperature for a month or a year
(ii) Recording of weekly sales of a shop taken over a year
(iii) Recording of bi-monthly telephone bills of a consumer taken over a period, say 5 years
(iv) Recording of imports at the end of each year for a period of 10 years, etc.
These are all examples of time series.
In fact, time series occur in every walk of life, be it finance, commerce, industry,
agriculture, medicine, education or even the domestic life of an individual.
TIME SERIES ANALYSIS By a time series analysis, we mean
(i) To study the behavior of the phenomenon over a period of time, and
(ii) To determine the various forces or influences which produce the variations in time series.
It is done mainly for the purpose of making forecasts for future & also for the purpose of
equally past performances.
Ques 2. With which Component of time series. Would you mainly associate the following:
(B.Com (P) 2001)
a) Heavy Sales on the occasion of Deepawali
b) Price hike in petroleum products due to Arab-Israel war
c) Increase in garment sales in October
d) Decline in sale of ice-cream during winter season
e) An era of prosperity
f) A fire in factory delaying production for 3 weeks
g) The annual stock taking in a departmental store
h) General increase in the demand for TV sets.
Answer
a) Seasonal, b) Irregular, c) Seasonal, d) Seasonal,
e) Cyclic, f) Irregular, g) Seasonal, h) Trend
Ques 3. What is the difference between Additive model & Multiplicative model use for
decomposition of time series.
Ans. Difference Additive model & Multiplicative model can be discussed as follows:
Basis Additive model Multiplicative model
1) Meaning Under this model decomposition of Under this model, decomposition
time series is done on the of time series is done on the
assumptions that effects of various assumption that effects of various
components are additive in nature. components are multiplicative in
nature.

2) Formula Y=T*S*C*I
Y=T+S+C+I
Y → Time series Value

3) Nature of S, C & I Here S.C & 1 are quantitative Here S, C & I are expressed as
deviation from trend rates percentages.

4) Assumption This model assumes that four This model assumes that effect of
components of time series are Four components of time series
independent of each other & none are not necessarily independent
has any effect of remaining three of each other i.e. their effects are
components. inter dependent.
Index Numbers
Ques 1. What do you mean by Index Numbers. (B.Com (P) 82)
Ans. Index Number is a statistical measure which is used to measure change over time in
magnitude which are not capable of direct measurement.
Ques2. Comment on the following statement:
i) During a certain period, cost of living index measured from 110 to 200 & salary of
worker increased from Rs 325 to Rs 500. Does the worker really gain?
ii) The average salary paid to worker in year 2000 is double that of 1990. So workers
enjoy a 100% higher standard of living in 2000 as compared to 1990.
iii) “For constructing index number, the best method on the Theoretical ground is not
the best method from practical point of view.”
iv) “Laspeyer method is best as compared to Paasches.”
Answers
200
i) Amount worker should have got in current year is 325 * Rsa 591 but he is getting
100
Rs 500. So worker is not fully compensated
ii) No, because increase in money wages is not an important criteria for standard of living
rather than it is real wages which should be the criteria for judging the standard of living
of workers.
iii) For construction of index no. best average on theoretical ground is Geometric mean
because of following reasons:
i) Geometric mean is best to measure ratio or percentage.
ii) Fisher ideal index no. is nothing but Geometric mean of Laspeyer & Paasches index
no.
but Geometric Mean because of time involved & complex calculation is not preferred from
practical point of view.
iv) It is true to some extent as Laspeyers methods weights are constant while in Paasche
weights are to be determined every time index no. is constructed. However, Laspeyeres
method has on upward bias whereas Paasches index no. has on downward bias
Short Answer Question
Ques 1. Define the following terms.
Base Shifting
i) Splicing
ii) Deflating
Ans. i) Base Shifting –
It mean charging of the given base year of an index number & forming a new series based
on some recent new base year & generally base shifting is resorted because of following
reasons:
Reasons for Base Shifting
1) Distant Base Period
When base year is too old to be of any use for meaningful comparison e.g. if prices of year
2000 are compared with price of 1950, comparison will be useless because of changed
conditions.
2) Comparison
When two series of index number with different base are to be compared if then such
comparison will be useful only if they are can vested so as to have common base.
Formula for Base Shifting
Old Index of the year
New Index of One Year =  100
Index No. of real base
Year Prices Index (Year 1 base) Index (Year 2
base)
e.g. 1 20 100 50
2 40 200 100
3 20 100 50
ii) Splicing
It means combining two or more series of over lapping index no. to obtain a single index
number on a common base.
For Splicing of index no. the important condition is that index no. are constructed with
same item & have an over lapping year.
Reason for Splicing of index no:
Splicing is done when an old index no: with an old base is being discontinued & a new
index with a new base is being started. To have continuity of comparison, the new index
no: is spliced to the old index number in the over lapping year or vice versa.
Series A (Year 3) as base)
e.g. Year Price Index (Year 1 are base) Index B
1 20 100
2 15 75
3 25 125 100
4 30 120
5 60 240
6 40 160
Index B Spliced to index A Index A Spliced to index B
(i.e. to make A Continuous Series) (to make B Continuous Series)
Year
4 150 Year
5 300 1 80
6 200 2 60
iii) Deflating
In simple words, it means making allowance for change in purchasing power of money
due to change in general price level.
e.g.
Year Price of rice (Rs per/kg)
1980 20
1990 40
Assume income Mr. X has is Rs 1,000 in 1980 & 1990, Then Quantities
of rice that can be purchased.
1980 50 Kg
1990 25 Kg
So, purchasing power has decreased by 50%.
I
Purchasing Power =
Price Index
Let us illustrate this why,
e.g. Price index = 2 (of 50% rise in price)
1
  50%
2
Because of change in purchasing power of money due to change in
general price level, one is more interested in real wages than money
wage.
Money w age
Real wage =  100
Price index

Real w age of Current Year


Real wage index =  100
Real w age of Prev iousYear
Ques 2. Differentiate between Chain Base Index method and Fixed Base Method to
constructed index no? (B.Com (P) 93)
Ans. Fixed Base Method Vs Chain Base Method
Basis Chain Base Method Fixed Base Method
(1) Meaning Under this method, base period Under this method, any Base period
immediately precedes the period for is arbitrary chosen & it is kept fixed.
which index is sought.
(2) Adjustment Weights can be adjusted as Weights cannot be adjusted so
of weights frequency as possible frequently.
(3) Suitability There are suitable for short periods These are money suitable for long
only. periods.
CBI of PY * LR of CY FBI of PY * CBI of CY
(4) Formula CBI of CY = FBI of CY =
100 100
Ques 2. What are various test of adequacy of index numbers? Explain in brief limitation of
index numbers.
(B.Com (H) 95, 98, 2001, B.Com (P) 86, 99, C.A. (P.E. – I) 95 M, 95 N, 97 M, 97 N, 99 M)
Ans. From a statistical point of view, the system of calculation used for current year index
numbers should be such that it satisfies certain mathematical tests. A number of tests have been
developed for this purpose. These include (a) the unit test, (b) the time reversal test, (c) the factor
reversal test and (d) the circular test.
The credit for proposing the time reversal test and factor reversal test goes to Professor living
Fisher.
(a) Unit Test. This requires the index numbers to be independent of the units in which prices
and quantities of various commodities are quoted. This test is satisfied by all the formulae
except the simple aggregative index.
(b) Time Reversal Test. According the Prof. Fisher “the formula for calculating an index
number should be such that it gives the same ratio between one point of comparison and the
other, no matter which of the two is taken as the base or putting it another way, the index number
reckoned forward should be reciprocal of the one reckoned backward.”
In symbols
P01 * P10 = 1 (Omitting the factor 100 from each index)
Where P01 denotes the index for current period 1 based on the base period 0 and P10 that is for
period 0 based on the base period 1.
Time reversal test is based on following analogy: If the price of a commodity increased from Rs 4
per unit in 1980 to Rs 6 in 1990 the price in 1990 is 150% i.e., 1.5 times the price in 1975, and the
price in 1970 is 66.67% i.e., 67 times the price in 1990. The product of these two price ratios is
1.5 * 0.67 = 1.
(c) Factor Reversal Test. In the words of Fisher, “Just as our formula should permit the
interchange of two times without giving inconsistent results, so it ought to permit interchanging
the price and quantities without giving inconsistent results – i.e., the two results multiplied
together should give the true value ratio, except for a constant of proportionality.
Analytically if
P01 is a price index for given year with reference to base year,
Q01 is the quantity index for the current year with reference to base year for the same
coverage of commodities.
Σ p1q1
P01 * Q01 =
Σ p 0 q0
Factor reversal test is based on the following analogy: If the price per unit of a commodity
increases from Rs 4 in 1980 to Rs 6 in 1998 and the quantity of consumption changes from 100
units to 140 units during the same period, then the price and quantity in 1980 are 150% and
140% respectively of the corresponding figures in 1980. The values (price * quantity) of
consumption were Rs 400 in 1990 and Rs 800 1978, so that the value ratio is thus we find that
the product of price ratio and quantity ratio equals the value ratio:
800
Value ratio = = 2:1
400
1.5 * 1.4 = 2.1. This is the true for each commodity.
Circular Test. This is another test for the adequacy of an index number. This test was first
suggested by Westergaard and highly favoured by C.M. Walsch who gave it the name of ‘circular
test’. It is base on the shift ability of the base and is merely an extension of the time reversal test.
According to this test, the index should work in a circular fashion, i.e., if an index number is
computed for the period 1 on the base period 0, another index number is computed for period 3
on the base period 2, and then the product should be equal to 1.
Symbolically
P01 * P12 * P23 * … * Pn-1,n * Pn,0 = 1
Limitation of Index Numbers
The limitation of index number can be discussed as follows:
(1) Limited Case: - Indices constructed for one purpose cannot be used for another
purpose. Every index number is constructed by a technique which is appropriate for the objective
with which the index is constructed. It cannot be used to serve a different objective. A wholesale
price index number cannot measure cost of living.
(2) Likely to be misused if based on using data: - Index numbers are liable to be
misused. If a wrong base has been chosen, or if weights are not assigned rationally or if the
appropriate formula for the construction of the indices has not been chosen the results could be
highly misleading and mischievous. Index Numbers use only limited number of items in their
calculation and may not reflect the true picture of the problem under study, if the items chosen
are not representative of the universe.

You might also like