Professional Documents
Culture Documents
Jaffa
Descriptive Statistics
Outline:
1. Variables and types of variables.
2. Data representation.
3. Measures of location.
4. Measures of spread.
5. Coefficient of variation.
6. Grouped/Recoded data.
7. Graphic Methods.
8. Learning Outcomes Covered in Lecture 1
1
EPHD-310 Basic Biostat Dr.Jaffa
Data Representation
2
EPHD-310 Basic Biostat Dr.Jaffa
Data Representation
Measures of Location
• Measure of location is a type of measure used to
summarize data by defining the center or middle of the
sample.
3
EPHD-310 Basic Biostat Dr.Jaffa
Measures of Location:
Arithmetic Mean
1 n
x xi
n i 1
Measures of Location:
Arithmetic Mean
• Limitation:
Arithmetic mean is very sensitive to extreme values.
4
EPHD-310 Basic Biostat Dr.Jaffa
Measures of Location:
Median
Measures of Location:
Median
• Sample median is:
(1) The n 1 th largest observation if n is odd
2
n n
(2) average of the 2 th and 2 1 th largest
observation if n is even.
5
EPHD-310 Basic Biostat Dr.Jaffa
Measures of Location
• Example 1:
Assume the weights in pounds for 5 people were collected
randomly and ordered from lowest to highest as such:
x(1) = 130; x(2) = 138; x(3) = 138; x(4) = 140; x(5)= 220.
Arithmetic Mean :
x1 x2 x3 x4 x5 130 138 138 140 220
x 153.2
n 5
Median: since n =5 (odd) then median is the value of the
third observation x(3) = 138. The median of this sample is 138.
Measures of Location
• Example 2:
Assume the weights in pounds for 6 people were collected
randomly and ordered from lowest to highest as such:
x(1) = 120; x(2) = 124; x(3) = 126; x(4) = 128;
x(5)= 398; x(6)=399
x1 x2 x3 x4 x5 x6
Arithmetic Mean : x
n
120 124 126 128 398 399
215.8
6
Mean=215.8 is shifted towards extreme values
(398 and 399 in this example).
6
EPHD-310 Basic Biostat Dr.Jaffa
Measures of Location
• Example 2: (continued)
Measures of Spread
7
EPHD-310 Basic Biostat Dr.Jaffa
Measures of Spread:
Range
• The range is the difference between the largest and
smallest observations: Range = x(n)- x(1)
• Example:
Assume you have the following observations ordered
from lowest to highest:
x(1) = 5; x(2) = 10; x(3) = 28; x(4) = 64; x(5)= 185.
Range = x(5)- x(1) = 185 – 5 = 180
Measures of Spread:
Percentiles (or quantiles )
8
EPHD-310 Basic Biostat Dr.Jaffa
Measures of Spread:
Percentiles (or quantiles )
Measures of Spread:
Percentiles (or quantiles )
9
EPHD-310 Basic Biostat Dr.Jaffa
10
EPHD-310 Basic Biostat Dr.Jaffa
Measures of Spread:
Percentiles (or quantiles )
• Upper and lower quartiles are respectively the 75th and 25th
percentiles of the sample. These are commonly used in the
literature.
Measures of Spread:
Variance and Standard Deviation
x x x x x
2 2 2 n 2
1 2 x xn x i 1 i
s 2
n 1 n 1
11
EPHD-310 Basic Biostat Dr.Jaffa
Measures of Spread:
Variance and Standard Deviation
• Note that the sum of all deviations from the mean is zero;
i.e.
n
i 1
xi x 0
• The sample standard deviation is the square root of the
sample variance :
x x
n 2
i 1 i
s
n 1
Measures of Spread:
Variance and Standard Deviation
Example:
Compute the sample variance and standard deviation for the
following observations ordered from lowest to highest:
x(1) = 9; x(2) = 10; x(3) = 12; x(4) = 14; x(5)= 16.
12
EPHD-310 Basic Biostat Dr.Jaffa
Measures of Spread:
Variance and Standard Deviation
Example: (continued)
Sample variance:
13
EPHD-310 Basic Biostat Dr.Jaffa
14
EPHD-310 Basic Biostat Dr.Jaffa
Graphic Methods
• Graphic methods are used to display data in a graphical format.
Graphic Methods:
Histograms
SPSS generated histogram corresponding to hours slept example
15
EPHD-310 Basic Biostat Dr.Jaffa
Graphic Methods:
Box-and-whisker Plot
• A box-and whisker plot (simply referred to as box plot)
presents the median, upper quartile, and lower quartile
of the sample.
Graphic Methods:
Box Plot for hours slept generated by SPSS
Upper quartile
Median
Lower quartile
16
EPHD-310 Basic Biostat Dr.Jaffa
Graphic Methods:
Box Plots
• If the distribution is symmetric, then the upper and lower
whiskers have equal length
Upper whisker
Lower whisker
Graphic Methods:
Box Plots
• A distribution is positively skewed or skewed to the right if
the upper whisker is longer than the lower whisker
Upper whisker
Lower whisker
17
EPHD-310 Basic Biostat Dr.Jaffa
Graphic Methods:
Box Plots
• A distribution is negatively skewed or skewed to the left if the
lower whisker is longer than the upper whisker
Upper whisker
Lower whisker
18
EPHD-310 Basic Biostat Dr.Jaffa
Right/positive Skeweness
19
EPHD-310 Basic Biostat Dr.Jaffa
Symmetrical No Skewness
Graphic Methods:
Box Plots
• A box plot can also be used to help detect outliers and
extreme values.
• In the next two slides I will show you for your general
information the formulas used to detect outliers and extreme
values with an application of these formulas.
• But you are NOT responsible for detecting by hand using the
following formulas the outliers and extreme values.
20
EPHD-310 Basic Biostat Dr.Jaffa
Graphic Methods:
Box Plots
• A box plot can also be used to help detect outliers and
extreme values (NOT responsible for the below. This is just for
your information).
Graphic Methods:
Box Plots
• Assess whether there are outliers or extreme outliers for the
following sample: 16, 10, 49 , 15, 6, 15, 8, 19, 11, 22, 13, 17
21
EPHD-310 Basic Biostat Dr.Jaffa
Graphic Methods:
Box Plots
SPSS generated box plot for this data is as follows:
3rd observation of the
data as entered to
SPSS. Here this 3rd
observation has the
value 49 and is
considered an
extreme outlier
Graphic Methods:
An example of Box plot generated by SPSS
Extreme value
outlier
outlier
22
EPHD-310 Basic Biostat Dr.Jaffa
Pie Chart:
SPSS Example of a Pie Chart
General Comments
23
EPHD-310 Basic Biostat Dr.Jaffa
General Comments
• You can also reduce these groups into group 1: 20000 to 40000;
group2: 40001 to 50000, group 3: more than 50000.
• This grouping can be achieved in SPSS using what we refer to as
“recoding”. 48
Dr. Jaffa Lecture1 Descriptive Statistics
24
EPHD-310 Basic Biostat Dr.Jaffa
25