You are on page 1of 67

Business Statistics

Descriptive Statistics

Descriptive Statistics

Business Statistics

1 / 67
Business Statistics
Descriptive Statistics

Mathematical Averages

Arithmetic mean Harmonic mean


• Simple Geometric mean
• Weighted

Averages’ Positions

Median Deciles
Quartiles Percentiles Mode

2 / 67
Business Statistics
Descriptive Statistics

Measures of Central Tendency


• Central tendency refers to the ”middle” value of data.

• Three types of measures:

• Mean

• Median

• Mode

3 / 67
Business Statistics
Descriptive Statistics

Definition (Mean)
• The mean (or arithmetic mean or average) is the most com-
mon measure of central tendency.
• The mean of a set of observations is the average of the
observations which provides a measure of central location
of the data.
• Mean is equal to the sum of all observations divided by the
number of observations in the set.
• For a sample of size n, the sample mean x̄ is
n
P
xi
i=1 x1 + x2 + ... + xn
x̄ = =
n n

4 / 67
Business Statistics
Descriptive Statistics

Mean (Contd...)
• For a population of size N, the population mean µ is
N
P
xi
i=1 x1 + x2 + ... + xN
µ= =
N N
• Applicable for interval and ratio data.
• Not applicable for nominal or ordinal data.
• Affected by each value in the data set, including extreme
values (also known as outliers).

5 / 67
Business Statistics
Descriptive Statistics

Example

1 2  3  4  5
Mean  3
5

1  2  3  4  10
Mean  4
5

6 / 67
Business Statistics
Descriptive Statistics

Definition (Weighted Mean)


• Considers the importance of each value.
• Weighted mean x̄w
n
P
(wi × xi )
i=1
x̄w = n ,
P
wi
i=1

where wi is the weight for observation i.

7 / 67
Business Statistics
Descriptive Statistics

Definition (Median)
• Median is an observation in the center of the dataset when
the data are arranged in ascending order.
• 50 % data lie above the median and 50 % data lie below it.

n+1
Median position = position in the ordered data.
2

Remark
n+1
is not the value of the median, only the position of the
2
median in the ordered data.

8 / 67
Business Statistics
Descriptive Statistics

Median (Contd...)
• Procedure to find the Median:

• Arrange the observations in an ascending ordered (smallest


to largest value) array.

• If there is an odd number of terms (observations), the me-


dian is the middle term of the ordered array.

• If there is an even number of terms, the median is the aver-


age of the middle two terms.

9 / 67
Business Statistics
Descriptive Statistics

Median (Contd...)
• Median is applicable for ordinal, interval, and ratio data.
• Not applicable for nominal data.
• Unaffected by extremely large and extremely small values.

Example

Median = 3

Median =3

10 / 67
Business Statistics
Descriptive Statistics

Definition (Mode)
• The most frequently occurring value in a data set.
• Applicable to all levels of data measurement (nominal, ordi-
nal, interval, and ratio).
• Not affected by extreme values.
• There may be no mode, single mode (uni-modal), two
modes (bi-modal) or several modes (multi-modal).

Example

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

11 / 67
Business Statistics
Descriptive Statistics

Definition (Quartiles)
• Quartiles split the ranked data into four segments with an
equal number of values per segment.

• The first quartile, Q1, is the value for which 25% of the ob-
servations are smaller and 75% are larger.
• Q2 is same as the median (50% are smaller, 50% are
larger).
• Only 25% of the observations are greater than the third
quartile Q3.
12 / 67
Business Statistics
Descriptive Statistics

Quartiles (Contd...)
Procedure to find the quartiles
• Arrange the data set in ascending order array.
• First quartile position is Q1

n+1
Q1 =
4
• Second quartile position is Q2

n+1
Q2 = , median position
2
• Third quartile position is Q1

3(n + 1)
Q3 =
4
13 / 67
Business Statistics
Descriptive Statistics

Definition (Percentiles)
• Measures of central tendency that divide a group of data
into 100 parts.
• At least n% of the data lie below the nth percentile, and at
most (100 − n)th of the data lie above the nth percentile.

Example
90th percentile indicates that at least 90% of the data lie below
it, and at most 10% of the data lie above it.

14 / 67
Business Statistics
Descriptive Statistics

Percentiles (Cont...)

Procedure to find the percentiles


• Arrange the data set in ascending order array.
• Calculate the percentile location i:

P
i= (n)
100
• Determine the percentile’s location and its value.
• If i is a whole number, the percentile is the average of the
values at the i and (i+1) positions.
• If i is not a whole number, the percentile is at the (i+1) posi-
tion in the ordered array.

15 / 67
Business Statistics
Descriptive Statistics

Percentiles (Cont...)
• Applicable for ordinal, interval, and ratio data.
• Not applicable for nominal data.

Remark
• 25th percentile = Q1 (first quartile)
• 50th percentile = median = Q2 (second quartile)
• 75th percentile = Q3 (first quartile)

16 / 67
Business Statistics
Descriptive Statistics

Definition (Measures of Variability)


• Measures of variability describe the spread or the disper-
sion of a set of data.
• Common Measures of Variability are:
• Range
• Inter-quartile Range
• Variance
• Standard Deviation
• Coefficient of Variation

17 / 67
Business Statistics
Descriptive Statistics

18 / 67
Business Statistics
Descriptive Statistics

Definition (Range)
• Range is the simplest measure of variation.

• Difference between the largest and the smallest values in a


set of data is known as range.
Range = xlargest − xsmallest

Example

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14-1=13

19 / 67
Business Statistics
Descriptive Statistics

Range (Contd...)
Disadvantages
• Ignores the way in which data are distributed

• Sensitive to outliers
Ex 1: 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range=5-1=4
Ex 2: 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range =120-1=119
20 / 67
Business Statistics
Descriptive Statistics

Definition (Interquartile Range)


• Can eliminate some outlier problems by using the interquar-
tile range.

• Eliminate some high and low-valued observations and cal-


culate the range from the remaining values.

• Interquartile range = 3rd quartile −1st quartile.


= Q3 − Q1

21 / 67
Business Statistics
Descriptive Statistics

Interquartile Range (Contd...)

Example

22 / 67
Business Statistics
Descriptive Statistics

Definition (Variance)
• Variance is based on the difference between the value of
each observation (xi ) and the mean and this difference is
called deviation about the mean.
• The variance of a set of observations is the average
squared deviation of the data points from their mean.
n
(xi − x̄)2
P

• Sample variance: s2 = i=1


n−1
N
(xi − µ)2
P

• Population variance: σ 2 = i=1


N

23 / 67
Business Statistics
Descriptive Statistics

Definition (Standard Deviation)


• The standard deviation of a set of observations is the posi-
tive square root of the variance of the set.
v
u n
uP
√ u (xi − x̄)2
• Sample standard deviation : s = s2 = i=1
t
n−1

v
u N
u (xi − µ)2
uP
√ t i=1
• Population standard deviation: σ = σ2 =
N

24 / 67
Business Statistics
Descriptive Statistics

Definition (Coefficient of Variation)


• It indicates how large the standard deviation is relative to
the mean.
• Measures relative variation.
• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare two or more sets of data measured
in different
 s  units
CV = × 100%

25 / 67
Business Statistics
Descriptive Statistics

Comparing Coefficient of Variation


• Stock A:
• Average price last year = $50
• Standard deviation = $ 7
s $7
CVA = × 100% = 100% = 14%
x̄ $50
• Stock B:
• Average price last year = $100
• Standard deviation = $ 10
s $10
CVB = × 100% = 100% = 10%
x̄ $100

Remark
Stock B is more variable than stock A, but stock B is less variable
relative to its price
26 / 67
Business Statistics
Descriptive Statistics

Relative location of values within a data set


The relative location help us determine how far a particular value
is from the mean.

Definition (Z Scores)
• A measure of distance from the mean (for example, a Z-
score of 2.0 means that a value is 2.0 standard deviations
from the mean).
• The difference between a value (xi ) and the mean (x̄), di-
vided by the standard deviation (s)
xi − x̄
Zi =
s

27 / 67
Business Statistics
Descriptive Statistics

Z Scores (Cont...)

Example
• If the mean is 14.0 and the standard deviation is 3.0, what
is the Z score for the value 18.5 ?

x − x̄ 18.5 − 14.0
Z= = = 1.5
s 3.0
• The value 18.5 is 1.5 standard deviations above the mean

Remark
A negative Z-score would mean that a value is less than the
mean.
28 / 67
Business Statistics
Descriptive Statistics

Example
Firm A is chosen from an industry (group of firms that produce
the same, or similar, products) where the mean rate of return of
firms is 10%, the standard deviation being 5%. Firm B is chosen
from another industry where the mean rate of return of firms is
12%, the standard deviation being 6%. If Firm A’s rate of return
is 16% and Firm B’s rate of return is 18%, which of the two is
more profitable compared to its industry ?

29 / 67
Business Statistics
Descriptive Statistics

Example
The top 6 small cap mutual funds with (+)ve high risk of 1 year
return are given in following table. Find the Z-scores for each
small cap.

Small cap mutual funds with (+)ve high risks


Fund Name 1 Y Returns (in %)
Escorts High Yield Equity plan 9.3
Axis Small Cap Fund 0.9
SBI Small Cap Fund 0.4
Principal Small Cap Fund 5.0
Union Small Cap Fund 0.4
BOI AXA Small Cap Fund 10.0
Source: https://groww.in/mutual-funds/category/best-small-
cap-mutual-funds

30 / 67
Business Statistics
Descriptive Statistics

Definition (Chebyshev Rule)


• It makes the statement about the proportion of data values
that must be within a specified number of standard devia-
tions of the mean.
• Regardless of how the data are distributed, at least (1 −
1/k2 ) × 100% of the values will fall within k standard devia-
tions of the mean (for k > 1)

Example
Atleast within
(1 − 1/22 ) × 100% = 75%, For k=2 µ ± 2σ
(1 − 1/32 ) × 100% = 89%, For k=3 µ ± 3σ
(1 − 1/42 ) × 100% = 94%, For k=4 µ ± 4σ

31 / 67
Business Statistics
Descriptive Statistics

Definition (The Empirical Rule)


• If the data distribution is approximately bellshaped, then the
interval:
• µ ± 1σ contains about 68% of the values in the population.

32 / 67
Business Statistics
Descriptive Statistics

The Empirical Rule (Contd...)


• µ ± 2σ contains about 95% of the values in the population
or the sample.
• µ ± 3σ contains about 99.7% of the values in the population
or the sample.

33 / 67
Business Statistics
Descriptive Statistics

Example
• A cold drink bottling plant fills bottles of 500 ml capacity
with mean of 500 ml and standard deviation of 5 ml. At
least what percentage of bottles would contain cold drink
between 490 and 510 ml?
• Suppose the time between applying for a credit card and
getting the credit card is approximately bell-shaped and its
average has been estimated to be 8 days with a standard
deviation of about 2 days. Approximately what fraction of
people get the credit card within 4 days of apply?

34 / 67
Business Statistics
Descriptive Statistics

Measures of Shape
• Skewness
• Absence of symmetry
• Extreme values in one side of a distribution

• Kurtosis
• Peakedness of a distribution
• Leptokurtic: high and thin
• Mesokurtic: normal shape
• Platykurtic: flat and spread out

• Box and Whisker Plots


• Graphic display of a distribution
• Reveals skewness and outliers

35 / 67
Business Statistics
Descriptive Statistics

Skewness

36 / 67
Business Statistics
Descriptive Statistics

Example
• The mean of some price quote data is 5.5056 and the me-
dian is 3.92. From this information, what can you deduce
about the symmetry or skewness of the distribution ?
• Suppose a frequency distribution is skewed with a median
of $75.00 and a mode of $80.00. Which of the following is
a possible value for the mean of distribution? (a) $64.00 (b)
$78.00 (c) $90.00

37 / 67
Business Statistics
Descriptive Statistics

Definition (Kurtosis)
• Peakedness of a distribution
• Leptokurtic: high and thin
• Mesokurtic: normal in shape
• Platykurtic: flat and spread out

38 / 67
Business Statistics
Descriptive Statistics

Box and Whisker Plot

• Five summary measures are used:


• Median, Q2
• First quartile Q1
• Third quartile Q3
• Minimum value in the data set
• Maximum value in the data set
• Outer Fences
• Lower outer fence = Q1 – 3.0 IQR
• Upper outer fence = Q3 + 3.0 IQR

39 / 67
Business Statistics
Descriptive Statistics

• The Box
• Median (Vertical line across the box)
• First quartile
• Third quartile
• The Whisker
• Lower inner fence = smallest observation within Q1 –1.5IQR
• Upper inner fence = Largest observation within Q3 + 1.5IQR

40 / 67
Business Statistics
Descriptive Statistics

Example of Raw Data

Sample of daily production in Yards of 30 carpet looms


16.2 15.8 15.8 15.8 16.3 15.6
15.7 16.0 16.2 16.1 16.8 16.0
16.4 15.2 15.9 15.9 15.9 16.8
15.4 15.7 15.9 16.0 16.3 16.0
16.4 16.6 15.6 15.6 16.9 16.3

41 / 67
Business Statistics
Descriptive Statistics

Organizing Data
• Data array: A sequence of data in ascending or descending
order.
• Frequency distribution: grouping data into some defined
classes.
• Cumulative distribution: how many observations lie above
or below certain value?

42 / 67
Business Statistics
Descriptive Statistics

Presenting Data in Array

Sample of daily production in Yards of 30 carpet looms


15.2 15.7 15.9 16.0 16.2 16.4
15.4 15.7 15.9 16.0 16.3 16.6
15.6 15.8 15.9 16.1 16.3 16.8
15.6 15.8 15.9 16.1 16.3 16.8
15.6 15.8 16.0 16.2 16.4 16.9

43 / 67
Business Statistics
Descriptive Statistics

Definition (Frequency Distribution)


• A frequency distribution is a list or a table.
• containing class groupings (ranges within which the data
fall).
• and the corresponding frequencies with which data fall
within each grouping or category
• A relative frequency distribution presents frequencies in
terms of fractions or percentages
• The classes in the frequency distribution are all-inclusive
and mutually exclusive

44 / 67
Business Statistics
Descriptive Statistics

Frequency Distribution Example

Data array of average inventory (in days) for 20 convenience


stores
2.0 3.8 4.1 4.7 5.5
3.4 4.0 4.2 4.8 5.5
3.4 4.1 4.3 4.9 5.5
3.8 4.1 4.7 4.9 5.5

45 / 67
Business Statistics
Descriptive Statistics

Frequency Distribution Example

Frequencry distribution of average inventory (in days) for 20 con-


venience stores (6 classes)
Class Frequency
2.0-2.5 1
2.6-3.1 0
3.2-3.7 2
3.8-4.3 8
4.4-4.9 5
5.0-5.5 4

46 / 67
Business Statistics
Descriptive Statistics

Frequency Distribution
• Each class grouping has the same width
• Determine the width of each interval by

xmax − xmin
Width of interval = ,
k
where,

xmax = Next unit value after largest value in data
xmin = Smallest value in data
k = Total number of class intervals.
• Usually at least 5 but no more than 15 groupings
• Class boundaries never overlap

47 / 67
Business Statistics
Descriptive Statistics

Frequency Distribution (Cont...)

A step by step example


16.2 15.8 15.8 15.8 16.3 15.6
15.7 16.0 16.2 16.1 16.8 16.0
16.4 15.2 15.9 15.9 15.9 16.8
15.4 15.7 15.9 16.0 16.3 16.0
16.4 16.6 15.6 15.6 16.9 16.3

48 / 67
Business Statistics
Descriptive Statistics

Frequency Distribution (Cont...)


• Step 1: Select number of classes
no. of classes = 6
• Step 2: Determine width of a class interval
width of a class interval = (17.0-15.2)/6=0.3
• Step 3: Generate class boundaries
Class boundaries: 15.2, 15.5, 15.8, 16.1, 16.4, 16.7, 17.0
• Step 4: Count observations and assign to classes

49 / 67
Business Statistics
Descriptive Statistics

Frequency Distribution

Class Frequency
15.2-15.4 2
15.5-15.7 5
15.8-16.0 11
16.1-16.3 6
16.4-16.6 3
16.7-16.9 3

50 / 67
Business Statistics
Descriptive Statistics

Relative Frequency Distribution


Class Frequency Relative frequency Percentagat
15.2-15.4 2 2/30=0.07 7
15.5-15.7 5 5/30=0.17 17
15.8-16.0 11 11/30=0.36 36
16.1-16.3 6 6/30=0.20 20
16.4-16.6 3 3/30 = 0.10 10
16.7-16.9 3 3/30 =0.10 10
Total 30 1.00 100

51 / 67
Business Statistics
Descriptive Statistics

Definition (The Histogram)


• A graph of the data in a frequency distribution is called a
histogram
• The class boundaries (or class midpoints) are shown on the
horizontal axis
• the vertical axis is either frequency, relative frequency, or
percentage
• Bars of the appropriate heights are used to represent the
number of observations within each class

52 / 67
Business Statistics
Descriptive Statistics

53 / 67
Business Statistics
Descriptive Statistics

Definition (Frequency Polygon)


• Used to represent frequency distributions graphically.
• Sketches outline of the data more clearly
• The polygon becomes increasingly smooth and curve-like
as we increase the number of classes and the number of
observations.

54 / 67
Business Statistics
Descriptive Statistics

Frequencey Polygon Example

55 / 67
Business Statistics
Descriptive Statistics

Frequencey Polygon Example

56 / 67
Business Statistics
Descriptive Statistics

Frequencey Polygon Example

57 / 67
Business Statistics
Descriptive Statistics

Definition (Cumulative Frequency Distribution)


• Enables us to see how many observations lie above or be-
low certain value.
• Less-than type and more-than type
• A graph of a cumulative frequency distribution is called an
ogive.
• The shape of an ogive for less-than type cumulative fre-
quency distribution would be slope up and to the right.

58 / 67
Business Statistics
Descriptive Statistics

Cumulative Frequency Distribution

Class Cumulative frequency cumulative relative frequency


Less than 15.2 0 0.00
Less than 15.5 2 0.07
Less than 15.8 7 0.23
Less than 16.1 18 0.60
Less than 16.4 24 0.80
Less than 16.7 27 0.90
Less than 17.0 30 1.00

59 / 67
Business Statistics
Descriptive Statistics

Ogive Example

60 / 67
Business Statistics
Descriptive Statistics

Definition (Pie Chart)


• Pie chart is a simple descriptive display often used to
present frequencies for categorical data.
• May be used for nominal or ordinal type data.
• The total area of the pie (circular in shape) represents 100%
of the quantity of interest.
• The arc length of each sector (and consequently its central
angle and area), is proportional to the quantity it represents.

61 / 67
Business Statistics
Descriptive Statistics

Pie Chart

Example (A job satisfaction survey)


Categories Responses (%)
Happy with career 33%
Enjoy job, but it is not on my career path 19%
Job is OK, but it is not on my career path 19%
Do not like my job, but it is on my career path 6%
My job just pays the bill 23%

62 / 67
Business Statistics
Descriptive Statistics

Example (Pie Chart)

Job satisfaction survey

Happy with career

23% Enjoy job, but it is not on my


33% career path

Job is OK, but it is not on my


6% career path

19% Do not like my job, but it is


19% on my career path

My job just pays the bill

63 / 67
Business Statistics
Descriptive Statistics

Bar Chart
• A bar chart is a chart with rectangular bars with lengths
proportional to the values that they represent.
• Often used to display categorical data.
• May be horizontal or vertical.
• Used to display values that were taken over time or on dif-
ferent conditions, usually on small data sets.

64 / 67
Business Statistics
Descriptive Statistics

Example (Bar Chart)


Investment type Amount (in thousands $) Percentage (%)
Stocks 46.5 42.27
Bonds 32.0 29.09
CD 15.5 14.09
Savings 16.0 14.55

65 / 67
Business Statistics
Descriptive Statistics

Example
Following are the number of items of similar type produced in a
factory during the last 50 days.

21 22 17 23 27 15 16 22 15 23
24 25 36 19 14 21 24 25 14 18
20 31 22 19 18 20 21 20 36 18
21 20 31 22 19 18 20 20 24 35
25 26 19 32 22 26 25 26 27 22

Arrange these observations into a frequency distribution with


both inclusive and exclusive class intervals choosing a suitable
number of classes.

66 / 67
Business Statistics
Descriptive Statistics

Example
If class mid-points in a frequency distribution of the ages of a
group of persons are: 25, 32, 39, 46, 53, and 60, find:
1 the size of the class-interval
2 the class boundaries
3 the class limits, assuming that the age quoted is the age
completed on the last birthdays.

67 / 67

You might also like