Professional Documents
Culture Documents
Descriptive Statistics
Descriptive Statistics
Descriptive Statistics
Descriptive Statistics
Business Statistics
1 / 67
Business Statistics
Descriptive Statistics
Mathematical Averages
Averages’ Positions
Median Deciles
Quartiles Percentiles Mode
2 / 67
Business Statistics
Descriptive Statistics
• Mean
• Median
• Mode
3 / 67
Business Statistics
Descriptive Statistics
Definition (Mean)
• The mean (or arithmetic mean or average) is the most com-
mon measure of central tendency.
• The mean of a set of observations is the average of the
observations which provides a measure of central location
of the data.
• Mean is equal to the sum of all observations divided by the
number of observations in the set.
• For a sample of size n, the sample mean x̄ is
n
P
xi
i=1 x1 + x2 + ... + xn
x̄ = =
n n
4 / 67
Business Statistics
Descriptive Statistics
Mean (Contd...)
• For a population of size N, the population mean µ is
N
P
xi
i=1 x1 + x2 + ... + xN
µ= =
N N
• Applicable for interval and ratio data.
• Not applicable for nominal or ordinal data.
• Affected by each value in the data set, including extreme
values (also known as outliers).
5 / 67
Business Statistics
Descriptive Statistics
Example
1 2 3 4 5
Mean 3
5
1 2 3 4 10
Mean 4
5
6 / 67
Business Statistics
Descriptive Statistics
7 / 67
Business Statistics
Descriptive Statistics
Definition (Median)
• Median is an observation in the center of the dataset when
the data are arranged in ascending order.
• 50 % data lie above the median and 50 % data lie below it.
n+1
Median position = position in the ordered data.
2
Remark
n+1
is not the value of the median, only the position of the
2
median in the ordered data.
8 / 67
Business Statistics
Descriptive Statistics
Median (Contd...)
• Procedure to find the Median:
9 / 67
Business Statistics
Descriptive Statistics
Median (Contd...)
• Median is applicable for ordinal, interval, and ratio data.
• Not applicable for nominal data.
• Unaffected by extremely large and extremely small values.
Example
Median = 3
Median =3
10 / 67
Business Statistics
Descriptive Statistics
Definition (Mode)
• The most frequently occurring value in a data set.
• Applicable to all levels of data measurement (nominal, ordi-
nal, interval, and ratio).
• Not affected by extreme values.
• There may be no mode, single mode (uni-modal), two
modes (bi-modal) or several modes (multi-modal).
Example
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
11 / 67
Business Statistics
Descriptive Statistics
Definition (Quartiles)
• Quartiles split the ranked data into four segments with an
equal number of values per segment.
• The first quartile, Q1, is the value for which 25% of the ob-
servations are smaller and 75% are larger.
• Q2 is same as the median (50% are smaller, 50% are
larger).
• Only 25% of the observations are greater than the third
quartile Q3.
12 / 67
Business Statistics
Descriptive Statistics
Quartiles (Contd...)
Procedure to find the quartiles
• Arrange the data set in ascending order array.
• First quartile position is Q1
n+1
Q1 =
4
• Second quartile position is Q2
n+1
Q2 = , median position
2
• Third quartile position is Q1
3(n + 1)
Q3 =
4
13 / 67
Business Statistics
Descriptive Statistics
Definition (Percentiles)
• Measures of central tendency that divide a group of data
into 100 parts.
• At least n% of the data lie below the nth percentile, and at
most (100 − n)th of the data lie above the nth percentile.
Example
90th percentile indicates that at least 90% of the data lie below
it, and at most 10% of the data lie above it.
14 / 67
Business Statistics
Descriptive Statistics
Percentiles (Cont...)
P
i= (n)
100
• Determine the percentile’s location and its value.
• If i is a whole number, the percentile is the average of the
values at the i and (i+1) positions.
• If i is not a whole number, the percentile is at the (i+1) posi-
tion in the ordered array.
15 / 67
Business Statistics
Descriptive Statistics
Percentiles (Cont...)
• Applicable for ordinal, interval, and ratio data.
• Not applicable for nominal data.
Remark
• 25th percentile = Q1 (first quartile)
• 50th percentile = median = Q2 (second quartile)
• 75th percentile = Q3 (first quartile)
16 / 67
Business Statistics
Descriptive Statistics
17 / 67
Business Statistics
Descriptive Statistics
18 / 67
Business Statistics
Descriptive Statistics
Definition (Range)
• Range is the simplest measure of variation.
Example
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14-1=13
19 / 67
Business Statistics
Descriptive Statistics
Range (Contd...)
Disadvantages
• Ignores the way in which data are distributed
• Sensitive to outliers
Ex 1: 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
Range=5-1=4
Ex 2: 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range =120-1=119
20 / 67
Business Statistics
Descriptive Statistics
21 / 67
Business Statistics
Descriptive Statistics
Example
22 / 67
Business Statistics
Descriptive Statistics
Definition (Variance)
• Variance is based on the difference between the value of
each observation (xi ) and the mean and this difference is
called deviation about the mean.
• The variance of a set of observations is the average
squared deviation of the data points from their mean.
n
(xi − x̄)2
P
23 / 67
Business Statistics
Descriptive Statistics
v
u N
u (xi − µ)2
uP
√ t i=1
• Population standard deviation: σ = σ2 =
N
24 / 67
Business Statistics
Descriptive Statistics
25 / 67
Business Statistics
Descriptive Statistics
Remark
Stock B is more variable than stock A, but stock B is less variable
relative to its price
26 / 67
Business Statistics
Descriptive Statistics
Definition (Z Scores)
• A measure of distance from the mean (for example, a Z-
score of 2.0 means that a value is 2.0 standard deviations
from the mean).
• The difference between a value (xi ) and the mean (x̄), di-
vided by the standard deviation (s)
xi − x̄
Zi =
s
27 / 67
Business Statistics
Descriptive Statistics
Z Scores (Cont...)
Example
• If the mean is 14.0 and the standard deviation is 3.0, what
is the Z score for the value 18.5 ?
x − x̄ 18.5 − 14.0
Z= = = 1.5
s 3.0
• The value 18.5 is 1.5 standard deviations above the mean
Remark
A negative Z-score would mean that a value is less than the
mean.
28 / 67
Business Statistics
Descriptive Statistics
Example
Firm A is chosen from an industry (group of firms that produce
the same, or similar, products) where the mean rate of return of
firms is 10%, the standard deviation being 5%. Firm B is chosen
from another industry where the mean rate of return of firms is
12%, the standard deviation being 6%. If Firm A’s rate of return
is 16% and Firm B’s rate of return is 18%, which of the two is
more profitable compared to its industry ?
29 / 67
Business Statistics
Descriptive Statistics
Example
The top 6 small cap mutual funds with (+)ve high risk of 1 year
return are given in following table. Find the Z-scores for each
small cap.
30 / 67
Business Statistics
Descriptive Statistics
Example
Atleast within
(1 − 1/22 ) × 100% = 75%, For k=2 µ ± 2σ
(1 − 1/32 ) × 100% = 89%, For k=3 µ ± 3σ
(1 − 1/42 ) × 100% = 94%, For k=4 µ ± 4σ
31 / 67
Business Statistics
Descriptive Statistics
32 / 67
Business Statistics
Descriptive Statistics
33 / 67
Business Statistics
Descriptive Statistics
Example
• A cold drink bottling plant fills bottles of 500 ml capacity
with mean of 500 ml and standard deviation of 5 ml. At
least what percentage of bottles would contain cold drink
between 490 and 510 ml?
• Suppose the time between applying for a credit card and
getting the credit card is approximately bell-shaped and its
average has been estimated to be 8 days with a standard
deviation of about 2 days. Approximately what fraction of
people get the credit card within 4 days of apply?
34 / 67
Business Statistics
Descriptive Statistics
Measures of Shape
• Skewness
• Absence of symmetry
• Extreme values in one side of a distribution
• Kurtosis
• Peakedness of a distribution
• Leptokurtic: high and thin
• Mesokurtic: normal shape
• Platykurtic: flat and spread out
35 / 67
Business Statistics
Descriptive Statistics
Skewness
36 / 67
Business Statistics
Descriptive Statistics
Example
• The mean of some price quote data is 5.5056 and the me-
dian is 3.92. From this information, what can you deduce
about the symmetry or skewness of the distribution ?
• Suppose a frequency distribution is skewed with a median
of $75.00 and a mode of $80.00. Which of the following is
a possible value for the mean of distribution? (a) $64.00 (b)
$78.00 (c) $90.00
37 / 67
Business Statistics
Descriptive Statistics
Definition (Kurtosis)
• Peakedness of a distribution
• Leptokurtic: high and thin
• Mesokurtic: normal in shape
• Platykurtic: flat and spread out
38 / 67
Business Statistics
Descriptive Statistics
39 / 67
Business Statistics
Descriptive Statistics
• The Box
• Median (Vertical line across the box)
• First quartile
• Third quartile
• The Whisker
• Lower inner fence = smallest observation within Q1 –1.5IQR
• Upper inner fence = Largest observation within Q3 + 1.5IQR
40 / 67
Business Statistics
Descriptive Statistics
41 / 67
Business Statistics
Descriptive Statistics
Organizing Data
• Data array: A sequence of data in ascending or descending
order.
• Frequency distribution: grouping data into some defined
classes.
• Cumulative distribution: how many observations lie above
or below certain value?
42 / 67
Business Statistics
Descriptive Statistics
43 / 67
Business Statistics
Descriptive Statistics
44 / 67
Business Statistics
Descriptive Statistics
45 / 67
Business Statistics
Descriptive Statistics
46 / 67
Business Statistics
Descriptive Statistics
Frequency Distribution
• Each class grouping has the same width
• Determine the width of each interval by
∗
xmax − xmin
Width of interval = ,
k
where,
∗
xmax = Next unit value after largest value in data
xmin = Smallest value in data
k = Total number of class intervals.
• Usually at least 5 but no more than 15 groupings
• Class boundaries never overlap
47 / 67
Business Statistics
Descriptive Statistics
48 / 67
Business Statistics
Descriptive Statistics
49 / 67
Business Statistics
Descriptive Statistics
Frequency Distribution
Class Frequency
15.2-15.4 2
15.5-15.7 5
15.8-16.0 11
16.1-16.3 6
16.4-16.6 3
16.7-16.9 3
50 / 67
Business Statistics
Descriptive Statistics
51 / 67
Business Statistics
Descriptive Statistics
52 / 67
Business Statistics
Descriptive Statistics
53 / 67
Business Statistics
Descriptive Statistics
54 / 67
Business Statistics
Descriptive Statistics
55 / 67
Business Statistics
Descriptive Statistics
56 / 67
Business Statistics
Descriptive Statistics
57 / 67
Business Statistics
Descriptive Statistics
58 / 67
Business Statistics
Descriptive Statistics
59 / 67
Business Statistics
Descriptive Statistics
Ogive Example
60 / 67
Business Statistics
Descriptive Statistics
61 / 67
Business Statistics
Descriptive Statistics
Pie Chart
62 / 67
Business Statistics
Descriptive Statistics
63 / 67
Business Statistics
Descriptive Statistics
Bar Chart
• A bar chart is a chart with rectangular bars with lengths
proportional to the values that they represent.
• Often used to display categorical data.
• May be horizontal or vertical.
• Used to display values that were taken over time or on dif-
ferent conditions, usually on small data sets.
64 / 67
Business Statistics
Descriptive Statistics
65 / 67
Business Statistics
Descriptive Statistics
Example
Following are the number of items of similar type produced in a
factory during the last 50 days.
21 22 17 23 27 15 16 22 15 23
24 25 36 19 14 21 24 25 14 18
20 31 22 19 18 20 21 20 36 18
21 20 31 22 19 18 20 20 24 35
25 26 19 32 22 26 25 26 27 22
66 / 67
Business Statistics
Descriptive Statistics
Example
If class mid-points in a frequency distribution of the ages of a
group of persons are: 25, 32, 39, 46, 53, and 60, find:
1 the size of the class-interval
2 the class boundaries
3 the class limits, assuming that the age quoted is the age
completed on the last birthdays.
67 / 67