You are on page 1of 101

BUSINESS STATISTICS

& ANALYTICS

KMBN104
Dr. Mani Tyagi
UNIT 1
DESCRIPTIVE STATISTICS
STATISTICS IS A SCIENCE DEALING WITH COLLECTION,
ANALYSIS, INTERPRETATION, AND PRESENTATION OF
NUMERICAL DATA.
“ AGGREGATES OF FACTS AFFECTED TO A
MARKET EXTENT BY MULTIPLICITY OF
CAUSES NUMERICALLY EXPRESSED,
ENUMERATED, OR ESTIMATED ACCORDING TO
REASONABLE STANDARDS OF ACCURACY,
COLLECTED IN A SYSTEMATIC MANNER FOR
PRE DETERMINEDPURPOSE AND PLACED IN TO
EACH OTHER”
HORACE SECRIST
IMPORTANT CHARACTERSTICS

• STATISTICS ARE AGGREGATES OF FACTS

• STATISTICS ARE EFFECTED TO A MARKED


EXTENT BY MULTIPLICITY OF CAUSES

• STATISTICS ARE NUMERICALLY EXPRESSED

• STATISTICS ARE COLLECTED IN A SYSTEMATIC


MANNER
Statistics in Business
• Accounting — auditing and cost estimation
• Economics — regional, national, and international
economic performance
• Finance — investments and portfolio management
• Management — human resources, compensation, and
quality management
• Management Information Systems — performance of
systems which gather, summarize, and disseminate
information to various managerial levels
• Marketing — market analysis and consumer research
• International Business — market and demographic analysis
Population Versus Sample
• Population — the whole
– a collection of persons, objects, or items under
study
• Census — gathering data from the entire
population
• Sample — a portion of the whole
– a subset of the population
A population is a collection of all possible individuals, objects, or
measurements of interest.

A sample is a portion, or part, of the population of interest


Descriptive vs. Inferential Statistics

• Descriptive Statistics — using data gathered


on a group to describe or reach conclusions
about that same group only

• Inferential Statistics — using sample data to


reach conclusions about the population from
which the sample was taken
Descriptive Statistics

• Collect data
– e.g., Survey

• Present data
– e.g., Tables and graphs

• Characterize data
– e.g., Sample mean =
X i

n
Methods of descriptive statistics

• Graphic method
• Numeric method

Graphic method : Numeric Method:


• Bar Charts • Measures of Central Tendency
• Line Graphs • Dispersion
• Pie Charts • Kurtosis
• Skew ness
Inferential Statistics
• Estimation
– e.g., Estimate the population
mean weight using the sample
mean weight
• Hypothesis testing
– e.g., Test the claim that the
population mean weight is 120
pounds

Drawing conclusions about a population based on sample results.


Types of Variables
A. Qualitative or Attribute variable - the
characteristic being studied is nonnumeric.
EXAMPLES: Gender, religious affiliation, type of automobile owned,
state of birth, eye color are examples.

B. Quantitative variable - information is


reported numerically.
EXAMPLES: balance in your checking account, minutes remaining in
class, or number of children in a family.

12
Summary of Types of Variables

13
FUNCTIONS OF STATISTICS

• Collection of Data

• Tabulation of Data

• Analysis of Data

• Interpretation of Data
Data Collection Methods
Collecting Data
Primary Secondary
Data Collection Data Compilation

Print or Electronic

Observation Survey

Experimentation
Primary & Secondary
• Direct personal
interview. • Government
• Indirect PI. publications.
• Questionnaire. • International
publications.
• Enumerators
• Semi official
publications.
• Private publications.
• Unpublished sources.
Classification & Tabulation of Data
THREE TYPES OF SERIES

 INDIVIDUAL SERIES

 DISCRETE SERIES

 CONTINOUS SERIES
INDIVIDUAL SERIES:

WHERE FREQUENCIES ARE NOT GIVEN.

DISCRETE SERIES:

WHERE BOTH FREQUENCIES AND VARIABLES ARE GIVEN.

CONTINUOUS SERIES :

WHERE VARIABLES ARE BETWEEN SOME INTERVALS.


Different Types of classes
Exclusive classes Inclusive classes
• 20-30 • 20-29
• 30-40 • 30-39
• 40-50 • 40-49
• 50-60 • 50-59
• 60-70 • 60-69
• 70-80 • 70-79
• 80-90 • 80-89
• 90-100 • 90-99
Descriptive Statistics
Measures of Central Tendency:
Ungrouped Data
• Measures of central tendency yield
information about “particular places or
locations in a group of numbers.”
• Common Measures of Central Tendency
– Mode
– Median
– Mean
Definition
• A la mode – the most
popular or that which is in
fashion.

Baseball caps are a la mode today.


Definition
• Mode – the number that
appears most frequently in a
set of numbers.

1, 1, 3, 7, 10, 13
Mode = 1
MODE
“The mode or the modal value is that in a series of
observations which occurs with the greatest frequency.”

Calculation of mode:
Individual Series:

• Count the number of times the various values repeat itself.


• If there are two or more values having the same maximum
frequency the mode is ill defined.
How to Find the Mode
in a Group of Numbers
• Step 1 – Arrange the numbers
in order from least to greatest.
21, 18, 24, 19, 18
18, 18, 19, 21, 24
How to Find the Mode
in a Group of Numbers
• Step 2 – Find the number that
is repeated the most.
21, 18, 24, 19, 18
18, 18, 19, 21, 24
Mode = 18
• Bimodal -- Data sets that have two modes
• Multimodal -- Data sets that contain more
than two modes
Mode -- Example
• The mode is 44.
35 41 44 45
• There are more 44s
than any other value. 37 41 44 46

37 43 44 46

39 43 44 46

40 43 44 46

40 43 45 48
Calculation of Mode

Discrete Series:

Preparation of Grouping and Analysis table

Grouping Table:

• Column 1, the maximum frequency is put in circle.


• Column 2, frequencies are grouped in two.
• Column 3, leave the first and then group the remaining in two.
• Column 4, frequencies are grouped in three.
• Column 5, leave the first and group the remaining in three.
• Column 6, leave first two and group the remaining in three.
• In each column take the maximum total and put it in a circle.
Analysis Table:

• Put column no. on the left hand side


• Various probable mode on the right hand side.
• The values against which frequencies are the highest
are marked in the grouping table.

Example:

Calculate the value of mode for the following data:

Marks : 10 15 20 25 30 35 40
Numbers: 08 12 36 25 28 18 09
Calculation of mode
Continuous Series:

1 i
Mo = L +
1+ 2

Where:
L = the lower limit of the modal class

1 = the difference between the frequency of modal and


pre-modal class(ignoring signs)

2 = the difference between the frequency of the modal class


and post modal class(ignoring signs)

i = the class interval of the modal class.


Example:

Marks No. of Students


0-10 03
10-20 05
20-30 07
30-40 10
40-50 12
50-60 15
60-70 12
70-80 06
80-90 02
90-100 08
Definition
• Median – the middle number
in a set of ordered numbers.

1, 3, 7, 10, 13
Median = 7
Median
• Middle value in an ordered array of numbers.
• Applicable for ordinal, interval, and ratio data
• Not applicable for nominal data
• Least affected by extremely values.
Median: Computational
Procedure
• First Procedure
– Arrange the observations in an ordered array.
– If there is an odd number of terms, the median is
the middle term of the ordered array.
– If there is an even number of terms, the median is
the average of the middle two terms.
• Second Procedure
– The median’s position in an ordered array is given
by (n+1)/2.
How to Find the Median in
a Group of Numbers
• Step 1 – Arrange the numbers
in order from least to greatest.
21, 18, 24, 19, 27
18, 19, 21, 24, 27
How to Find the Median in
a Group of Numbers
• Step 2 – Find the middle
number.

18, 19, 21, 24, 27


This is your median number.
How to Find the Median in
a Group of Numbers
• Step 3 – If there are two middle
numbers, find the mean of these
two numbers.

18, 19, 21, 25, 27, 28


Median: Example
with an Odd Number of Terms
Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22

• There are 17 terms in the ordered array.


• Position of median = (n+1)/2 = (17+1)/2 = 9
• The median is the 9th term, 15.
• If the 22 is replaced by 100, the median is 15.
• If the 3 is replaced by -103, the median is 15.
Median: Example
with an Even Number of Terms
Ordered Array
3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21

• There are 16 terms in the ordered array.


• Position of median = (n+1)/2 = (16+1)/2 =
8.5
• The median is between the 8th and 9th
terms, 14.5.
• If the 21 is replaced by 100, the median is
14.5.
• If the 3 is replaced by -88, the median is
14.5.
MEDIAN (DISCRETE SERIES)

Steps:
• Arrange data in ascending/descending order.
• Find out cumulative frequency.
• Apply formula : Median = Size of N+1/2 th
item.
• Now look at cumulative frequency column and
find that value which is either equal to or next
higher to N+1/2.
• Determine the value of variable corresponding
to it.
• This value is the median.
EXAMPLE:

From the following data find the value of the median:

Income No. of persons

1000 24
1500 26
1501 16
2000 20
2500 06
1800 30
MEDIAN (CONTINUOUS SERIES)

MEDIAN = L + N/2 – c.f. i

Where:
L = Lower limit of the median class
c.f. = Cumulative frequency of the class preceding the
median class
f = Simple frequency of the median class
i = the class interval of the median class
EXAMPLE:

Calculate the median for the following frequency distribution.

Marks No. of students

5_10 07
10_15 15
15-20 24
20-25 31
25-30 42
30-35 30
35-40 26
40-45 15
45-50 10
ARITHMETIC MEAN

Definition
• Mean – the average of a
group of numbers.

2, 5, 2, 1, 5
Mean = 3
ARITHMETIC MEAN
“ Sum of observed values of a set divided by the
number of observations in the set is called a mean
or an average.”
ARITHMETIC MEAN – INDIVIDUAL SERIES

If X1, X2, …………,Xn are N observed values,


the mean or average is given as ;

A or X = X1+X2+……..+Xn 1 Xi
=
N N
EXAMPLE :

1. The following table gives the monthly income of 10


employees in an office :

Income (Rs): 1780, 1760, 1690, 1750, 1840, 1920, 1100, 1810,
1050, 1950
Calculate the arithmetic mean of incomes.

SHORT-CUT METHOD:

Where
X=A+ d A = Assumed mean
d = deviation of items from assumed
N mean
ARITHMETIC MEAN – DISCRETE SERIES

i) DIRECT METHOD

ii) SHORT-CUT METHOD

i) DIRECT METHOD :

X= fX Where
f = frequency
f=N X = variable in question
N = total number of observations
EXAMPLE :

The following data gives the marks obtained by 60 students


in a class. Calculate the arithmetic mean.

Marks No. of Students


20 08
30 12
40 20
50 10
60 06
70 04
ARITHMETIC MEAN – CONTINUOUS SERIES

DIRECT METHOD: Where


d = mid point of various classes
fd N = the total frequency
= f = the frequency of each interval
X
N
EXAMPLE:

From the following data compute the arithmetic mean by


Direct method.

Marks No. of students

0-10 05
10-20 10
20-30 25
30-40 30
40-50 20
50-60 10
MEASURES OF
DISPERSION
What is dispersion?
Dispersion means variation in size of data.
Dispersion literally means scatter ness, whether
there is homogeneity or heterogeneity in frequency
distribution.
Objectives related to the measurement of Dispersion:
1.To estimate the average distance of items from average
of series.
2. To know the construction or formation of series .
3. To know the limit of variation of item values.
4. To make a comparative study of the variability of two
series .
5. To see that which limit and series the average represents.
Absolute and Relative Measures of Dispersion:
Absolute Measure:
When variation or scatter ness of a series is measured in terms of
original units of a series, it is called absolute measure of dispersion.
Ex: income, height, weight, age will be represented in absolute
measure.
Relative Measure :
For comparative study absolute measurement is changed into
relative measurement as percentage of ratio by dividing it by
average.

Relative dispersion =
Absolute measurement X 100
Average
Ex: In two factories average wages of labour are Rs.
250 and Rs 300 respectively and absolute dispersion
in both factories is Rs 40, then it will be wrong to say
that dispersion in both factories is same. For
comparative study, the relative dispersion of lab our in
both factories is as follows:
Relative dispersion in first factory:
40 x 100 = 16%
250
Relative dispersion in second factory:
40 x 100 = 13.33%
300
Thus it is clear that dispersion of labour is more in first factory
Methods Of Measuring Dispersion:

1. Method of Limits- 2. Method of Averaging Deviations-


a) Range a) Quartile Deviation
b) Inter Quartile range b) Mean Deviation
c) Standard Deviation
c) Percentile range .
3. Graphic Method:
Lorenz Curve
Dispersion refers to the spread or variability in the
data.

30

25

20

15

10

0
0 2 4 6 8 10 12
RANGE
The difference between largest value and smallest value of a
data series is called a range.

Range = Largest value - Smallest value

Co-efficient of Range
For comparative study of dispersion, it is necessary to know
the coefficient of range .
Co-efficient of Range = L - S
L + S
The following represents the current year’s Return on
Equity of the 25 companies in an investor’s portfolio.

-8.1 3.2 5.9 8.1 12.3


-5.1 4.1 6.3 9.2 13.3
-3.1 4.6 7.9 9.5 14.0
-1.4 4.8 7.9 9.7 15.0
1.2 5.7 8.0 10.3 22.1

Highest value: 22.1 Lowest value: -8.1

Range = Highest value – lowest value


= 22.1-(-8.1)
= 30.2
Ex : The following are the marks of students in
Statistics and Economics. Compare the range in two
subjects.
Statistics: 20 42 30 35 25 10 16 40 18 32
Economics: 31 29 33 42 20 12 15 38 20 35
Ex Calculate the range and its coefficient:-
Class: 0-10 10-20 20-30 30-40 40-50
F 2 8 15 35 10

Ex Calculate the range and its coefficient:

Age: 15-19 20-24 25-29 30-34 35-39


F: 8 10 17 8 7
Example 1: Find the median and quartiles for the data below.
12, 6, 4, 9, 8, 4, 9, 8, 5, 9, 8, 10
Order the data
Q1 Q2 Q3

4, 4, 5, 6, 8, 8, 8, 9, 9, 9, 10, 12

Lower Upper
Median
Quartile Quartile
= 8
= 5½ = 9
Example 2: Find the median and quartiles for the data below.
6, 3, 9, 8, 4, 10, 8, 4, 15, 8, 10
Order the data
Q1 Q2 Q3

3, 4, 4, 6, 8, 8, 8, 9, 10, 10, 15,

Lower Upper
Quartile Median Quartile
= 4 = 8 = 10
Inter Quartile Range:

Difference between third quartile (Q3) and first quartile (Q1)


of a series is called Inter Quartile Range.

Interquartile range = Q3 - Q1

The Interquartile
This distance will
range is the distance
include the middle 50
between the third
percent of the
quartile Q3 and the
observations.
first quartile Q1.
For a set of
observations the third
quartile is 24 and the
first quartile is 10.
What is the quartile
range?
The inter quartile range is
24 - 10 = 14.
Methods of Averaging Deviations:
Quartile Deviations:
This measurement of dispersion is also based on Q1 and Q3.
Quartile Deviation = Q3 – Q1
2
Coefficient Of Quartile Deviation

Coefficient of Q.D = Q3 – Q1
Q3 + Q1
Important Formulas (For Individual and
Discrete Series)

• Q1 (1st Quartile) = N+1/4th item


• Q2 (2nd Quartile) = Median
• Q3 (3rd Quartile) = 3(N+1/4)th item
• D4 (4th Decile) = 4(N+1/10)th item
• D7 (7th Decile) = 7(N+1/10)th item
• P45 (45th Percentile) = 45(N+1/100)th item
• P85 (85th Percentile) = 85(N+1/100)th item
Important Formulas (For Continuous
Series)
 Q1 (1st Quartile) = N/4th item
 Q2 (2nd Quartile) = Median
 Q3 (3rd Quartile) = 3(N/4)th item
 D4 (4th Decile) = 4(N/10)th item
 D7 (7th Decile) = 7(N/10)th item
 P45 (45th Percentile) = 45(N/100)th item
 P85 (85th Percentile) = 85(N/100)th item
L + As desired – c.f. i
f
SAME AS MEDIAN
Ex Calculate Q.D and its co-efficient.
Height(inch) Weight(pounds)
58 117
56 112
62 127
61 123
63 125
64 130
65 106
59 119
62 121
65 132
Ex 6 Calculate Q.D and its co-efficient.
Wages (Rs) No. of persons
20 8
21 10
22 11
23 16
24 20
25 25
26 15
27 9
28 6
Ex 7 Calculate the co-efficient of Q.D from the following data:
Marks No. of students
0-10 7
10-20 10
20-30 13
30-40 26
40-50 16
50-60 8
Mean Deviation:

Mean deviation is the arithmetic average of all the values


taken from a statistical average ( mean, median, mode) of
series. In taking deviation of values, algebraic signs + and
- are not taken into consideration.

Calculation Of Mean Deviation:


1. Choosing Mean: According to the principle, MD should be
calculated from Median because it is more stable,
arithmetic mean is also used but mode should be avoided
because it has very uncertain value.
2. Sum of Deviations: While calculating mean deviation +
and – signs are not considered i.e all the deviations are
assumed to be positive and their sum is taken as  d .
Computation of Mean Deviation – Individual Series:

M.D =  | D | / N,
Where | D | = X – mean/ median
Coefficient of M.D. = Mean Deviation / Mean or Median

Example:
Calculate the mean deviation from median of the
following series.

4000
4200
4400
4600
4800
Calculate mean deviation and its co-efficient from the
following data from arithmetic mean and median.
Price (Rs) 47,50,45,40,52,55,58,53,60,65,69
Discrete And Continuous Series:

Mean Deviation Co-efficient of mean


deviation
I From Mean  x =  fdx . x= x
N x
II From Median m =  fdm . m= m
N m

III From Mode z =  f dz . z= z


N z
Ex 9 Calculate the mean deviation from the data given below.
Size F
10 2
12 1
14 3
16 6
18 4
20 3
22 1
Ex Calculate the mean deviation and its coefficient:-
Class: 0-10 10-20 20-30 30-40 40-50
F 2 8 15 35 10

Ex Calculate the mean deviation and its coefficient:

Age: 15-19 20-24 25-29 30-34 35-39


F: 8 10 17 8 7
“STANDARD DEVIATION”
Standard Deviation:
• Std deviation is the best and scientific method of dispersion.
• It is widely used method used in statistical analysis.
• Std deviation is that method of dispersion where deviations are taken from
mean and while taking deviations algebraic signs are kept in mind.

• Also known as root mean square deviation.

• The greater the standard deviation, the greater will be the


magnitude of the values from their mean.

• A small standard deviation means a high degree of uniformity


of the observations as well as homogeneity of the series.

• The standard deviation is useful in judging the effectiveness


of the mean.
Standard Deviation Individual Series:

= x2 Also
N s.d = X2 - X 2
where X= variables
Where N N
x=X-X

If Assumed mean is taken


2 2
 =  d -  d
N N

Where
d = X - A (Assumed mean)
Example:

Blood Serum cholesterol levels of 10 persons


are as under:

240 288
260 272
290 263
245 277
255 251

Calculate the standard deviation with the help


Of assumed mean.
Standard Deviation (Discrete series)/ Continuous Series

• Actual mean method


• Assumed mean method
• Step Deviation method

Actual mean method:


2
 fx
=  N
Where x = X – (mean)
Assumed Mean Method:
2 2

=  fd
N
f d
N

Step Deviation Method:


2 2

=  fd
N
f d
N
X i

Here d = X – A / i
Example:
The annual salaries of a group of employees are given
in the following table:

Salaries(in 000) Number of persons


45 03
50 05
55 08
60 07
65 09
70 07
75 04
80 07

Calculate the standard deviation of the salaries.


Example:

Calculate mean and standard deviation of the following


Frequency distribution of marks:

Marks No. of Students


0-10 05
10-20 12
20-30 30
30-40 45
40-50 50
50-60 37
60-70 21
Coefficient of Variations:

C.V =  / X x 100

Example: The following table shows that monthly expenditures


of 80 students of a university on morning breakfast :

Expenditure No. of students


78-82 02
73-77 06
68-72 07
63-67 12
58-62 18
53-57 13
48-52 09
43-47 07
38-42 04
33-37 02
Calculate standard deviation and coefficient of variation of above data
Example:

From the prices of shares of X and Y below find out


which is more stable in value.

X Y
35 108
54 107
52 105
53 105
56 106
58 107
52 104
50 103
51 104
49 101
Variance: It is the square of the standard deviation

i.e..Variance = 2
Example:
The number of employees, wages per employee and the variance
of the wages per employee for two factories is given below

Factory A Factory B
No. of employees 100 150
Average wage 3200 2800
Variance of wage 625 729
(a) In which factory is there a greater variation in the distribution
of wages per employee.
(b) Suppose in factory B, the wages of an employee were
wrongly noted as Rs. 3050 instead of Rs. 3650, what
would be the correct variance for factory B.
Example:
The mean of 5 observations is 4.4 and the variance is 8.24.
If the three of the five observations are 1, 2 and 6, find the other two.

Example:
The following table gives the marks obtained by a group of
80 students in an examination. Calculate the variance.
Marks obtained No. of Students
10-14 02
14-18 04
18-22 04
22-26 08
26-30 12
30-34 16
34-38 10
38-42 08
42-46 04
46-50 06
50-54 02
54-58 04
Skewness:
“It refers to the asymmetry or lack of symmetry
in the shape of a frequency distribution.”
As far the study of central tendency the statistical average is
calculated and for scatter of values, dispersion is measured. In the
same way to study the symmetrical or asymmetrical nature of series,
skew ness is calculated
Types of frequency distribution:

1. Normal frequency distribution

2. Asymmetrical Distribution
Normal frequency distribution
One main feature of normal distribution is that mean, median
and mode are found equal .in such a distribution the frequencies
gradually increase, they are maximum in the center and then
decrease. When this distribution is plotted on a graph it will be a
bell-shaped graph. It is also called normal curve.

M ean
M e d ia n
M ode

Zero skew ness Mean =Median =Mode


Asymmetrical Distribution:- In a asymmetrical distribution the rate of
increase or decrease of frequencies is not same. Mean, median and
mode are not equal such a distribution is called asymmetrical
distribution. It is of two types:

(i) Positive Skew ness: When in a series mean is more than median
and median is more than mode then skew ness is positive i.e curve
is seen more towards left.

(ii) Negative Skew ness:- When in a series mean is less than


median and median is less than mode then skew ness is negative
i.e curve is seen more towards right.
Positively skewed: Mean and median are to the right of the mode.

Mean>Median>Mode

M ode M ea n
M ed ia n
Negatively Skewed: Mean and Median are to the left of the Mode.

Mean<Median<Mode

M ea n M ode
M ed ia n
Karl Pearson’s Coefficient of Skewness:

Sp = Mean – Mode/ Standard Deviation

Bowley’s Coefficient of Skewness

SB = Q3 + Q1 – 2 Median/Q3 – Q1

You might also like