You are on page 1of 2

UNIT 3: Descriptive Statistics  Minimum – the lowest value in the data set

2.) Variance – the square of the standard deviation


 Descriptive Statistics: numerical measures that  Population variance: 𝜎2 Sample variance: s2
are used to describe certain characteristics of 3.) Standard deviation - a measure of dispersion
the data which indicates the extent of scattering of the
observations from the mean
Common Types of Descriptive Measures
1. Measures of Central Location: mean, median,
mode
2. Measures of Variability: range, variance,
standard deviation, standard error, coefficient
of variation
3. Measures of Shape: Skewness, Kurtosis 4.) Standard Error (of the mean) - is used to
4. Other Summary Statistics: sum, count measure how well the obtained statistic will
5. Measures of Location: Fractiles (Percentile, estimate the target parameter. The smaller the
Decile, Quartile) standard error, the better the statistic
6. Box-and-Whisker Plot (5-number summary)

1. Measures of CENTRAL LOCATION/TENDENCY


 used to identify the “center” or the typical value  S = Standard deviation
in the data set, can be referred to as “average”  n = sample size
 TYPES:  useful when constructing confidence interval of
1.) Mean: sum of all the values in the observation the mean.
divided by the number of observations in the 5.) Coefficient of Variation % - how scattered the
data set. data relative to the mean. It is a relative
measure of variation that is always expressed as
a percentage.

 CV = ( x́s ) 100 %
 very useful when comparing the two or more
data sets that have different means and/or
measured in different unit of measurement

2.) Median: a value that divides an ordered set of 3. Measures of SHAPE


data (array) into two equal parts (usually
denoted by Md). 1.) Measure of Skewness- refer to the degree of
 Md = middle value in the array when n is odd asymmetry, or departure from symmetry of a
 Md = mean of the two middle values when n is distribution.
even
3.) Mode: the value in the data set that occurs with
the greatest frequency.

2. Measures of VARIABILITY/DISPERSION
 indicate the extent to which individual  Simplified Guideline
observations in a set of data are scattered a.) If │ skewness │ ≤ 2*standard error, then
about an average symmetric.
 TYPES: b.) If │skewness │ > 2*standard error,
1.) Range – maximum value minus the minimum then skewed right if Sk is positive;
value in the data set. or skewed left if Sk is negative;
 Maximum – the highest value in the data set
6
where std. error =
√ n
(estimate) 5. Box-and-Whisker Plots
 A box-and-whisker plot (sometimes called a box
plot) is often used to provide a visual summary
of a set of data.
 A box-and-whisker plot shows the median, the
3(mean−median) first and third quartiles, and the minimum and
For the alternative formula: Skew ¿ maximum values of a data set. See the figure
standard deviation
below.
 SAMPLE INTERPRETATION OF DATA:
 Twice the std. error is 2x0.687 = 1.374
 The distribution is symmetric if Sk is within the
interval [-1.374, 1.374]
 Conclusion: Since sk=-0.407 is within the
interval, thus the distribution may be
considered symmetric.  Detecting OUTLIERS o
If the distance from the box exceeds 1.5 times the
2.) Measure of Kurtosis – measures the extent to interquartile range (in either direction), the observation
which observations cluster around a central may be labelled an outlier and is denoted by o .
point  Interquartile Range (IQR) = Q3-Q1
Lower Fence (LF): Q1 – 1.5 x IQR
Upper Fence (UF): Q3 + 1.5 x IQR
 Whiskers:
a.) Locate the smallest value contained in the
interval [LF, Q1] and form a whisker.
b.) Locate the largest value contained in the
interval [UF, Q3] and form a whisker.
 Note: If the distance from the box exceeds 3
times the IQR (in either direction), the
 Simplified Guideline observation is called a far outlier and is denoted
a.) If │ kurtosis│ ≤ 2*standard error, then by x.
Mesokurtic. Lower Fence (LF): Q1 – 3 x IQR
b.) ii) If │kurtosis│ > 2*standard error, Upper Fence (UF): Q3 + 3 x IQR
then Leptokurtic if Ku is positive;
or Platykurtic if Ku is negative . Excel Functions
= min(data)
24
where std. error =
√ n
(estimate) =QUARTILE.EXC(data,1)
=median(data)
=QUARTILE.EXC(data,3)
= max(data)

 SAMPLE INTERPRETATION OF DATA:


 Twice the std. error is 2x1.549 = 3.098. The
distribution is mesokurtic if Ku is within the
interval [-3.098, 3.098]
 Conclusion: Since Ku=-1.258 is within the
x - μ
z =
interval, thus the distribution may be σ
considered mesokurtic.  For any normal curves, the computation of the
4. Other Summary Statistics probability is based on the Standard Normal
Curve.
1.) Sum – computed by adding numerical  Standard Normal Table: The areas in the
observations Standard Normal Table will be used in the
2.) Count – total number of observations computation of the probability.

You might also like