Professional Documents
Culture Documents
statistics
Arinaitwe Irene, PhD
04/12/2024
Statistics
04/12/2024
Descriptive statistics
04/12/2024
Example
04/12/2024
Descriptive statistics
There are two general types of statistic that are used to describe data:
• Measures of central tendency
• Measures of spread
04/12/2024
Scales
04/12/2024
Nominal
04/12/2024
Interval
04/12/2024
Ratio
• Class task
• Example ???
04/12/2024
Ratio
• Ratio variables are interval variables, but with the added condition
that 0 (zero) of the measurement indicates that there is none of that
variable.
• Examples of ratio variables include height, mass, and distance.
04/12/2024
Measures of Central Tendency
• The mean (or average) is the most popular and well-known measure of
central tendency.
• It can be used with both discrete and continuous data, although its use
is most often with continuous data.
• An important property of the mean is that it includes every value in
your data set as part of the calculation.
04/12/2024
When mean is not appropriate-(example 1)
1 2 3 4 5 6 7 8 9 10
The mean salary for these ten staff is staff
$30.7k. However, inspecting the raw data
suggests that this mean value might not be Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
the best way to accurately reflect the typical
salary of a worker, as most workers have
salaries in the $12k to 18k range. The mean
is being skewed by the two large salaries.
04/12/2024
• Median over the mean (or mode) is preferred data is
When mean is not skewed.
appropriate- • If we consider the normal distribution - as this is the most
frequently assessed in statistics - when the data is perfectly
(example2) normal, the mean, median and mode are identical.
• However, as the data becomes skewed the mean loses its
ability to provide the best central location for the data
because the skewed data is dragging it away from the
typical value.
• Median best retains this position and is not as strongly
influenced by the skewed values therefore more realistic
than mean.
04/12/2024
• Median is the number present in the middle when the numbers in a
The Median set of data are arranged in ascending or descending order.
• Computing the median when the number of scores in the set is odd
involves the following steps:
1. Order the scores in numerical order from lowest to highest
2. Count the number of scores
3. Select the middle score as the median
04/12/2024
•
Median Example If the set of scores is {15, 20, 35, 45, 50, 56, 67}, then the
median is: 45.
• If our set of scores is {15, 20, 21, 20, 36, 15, 25, 15} before
computing median, we need to order these values in
ascending order; getting {15,15,15,20,20,21,25,36}
04/12/2024
• Mode is the value that occurs most frequently in a
set of data.
The Mode • {14, 25, 23, 67, 25, 78, 65, 45, 25, 18, 20, 89,
25, 90}
04/12/2024
Skewed Distributions and
the Mean and Median
04/12/2024
Skewed data
04/12/2024
when to use the mean, median and mode
Nominal Mode
Ordinal Median
04/12/2024
Measures of spread
• These are ways of summarizing a group of data by describing how spread out the
scores are.
For example, the mean score of our 100 students may be 65 out of 100.
However, not all students will have scored 65 marks. Rather, their scores will
be spread out. Some will be lower and others higher.
Measures of spread statistics include the range, quartiles, absolute
deviation, variance and standard deviation.
04/12/2024
Range
• In our example data set {15,15,15,20,20,21,25,36}, the high value is 36 and the
low is 15.
• So the range is 36 - 15 = 21
• However, the range only provides information about the maximum and minimum values and does not say
anything about the values in between.
• A commonly used measure of dispersion is the standard deviation, which is simply the square root of the
variance.
• The variance, is defined as the sum of the squared distances of each term in the distribution from the mean,
divided by the number of terms in the distribution.
• Squaring the difference makes each term positive so that values above the mean do not cancel values below the
mean.
• The Standard Deviation shows the relation that the set of scores has
to the mean of the sample.
n 1
• Where xi is the individual score, x is the mean of all the scores,
and n is the number of observations.
04/12/2024 BIT 1205 26
Example
• Computation of SD: First find the distance between each value and the mean.
• From above; the mean is 20.875. So, the differences from the mean are:
15 - 20.875 = -5.875
20 - 20.875 = -0.875
21 - 20.875 = +0.125
20 - 20.875 = -0.875
36 - 20.875 = 15.125
15 - 20.875 = -5.875
25 - 20.875 = +4.125
15 - 20.875 = -5.875
• Note: The values below the mean have negative discrepancies and values
above it have positive ones.
04/12/2024 BIT 1205 27
Example cont’d
• Next, we square each discrepancy:
-5.875 * -5.875 = 34.515625
-0.875 * -0.875 = 0.765625
+0.125 * +0.125 = 0.015625
-0.875 * -0.875 = 0.765625
15.125 * 15.125 = 228.765625
-5.875 * -5.875 = 34.515625
+4.125 * +4.125 = 17.015625
-5.875 * -5.875 = 34.515625
• Take these "squares" and sum them to get the Sum of Squares (SS) value.
The sum is 350.875. We then divide this sum by the number of scores minus
1.
• Here, the result is 350.875 / 7 = 50.125. This value is known as the variance.
04/12/2024 BIT 1205 28
Example cont’d
04/12/2024