You are on page 1of 47

MEASURES OF CENTRAL

TENDENCY
UNIT 6.1
In statistics, a central tendency is a central or typical
value for a probability distribution.
It may also be called a center or location of the
distribution.
Colloquially, measures of central tendency are often
called averages.
The term central tendency dates from the late 1920s.
Central Values – Many times one number is used to
describe the entire sample or population.
There are many ways to compute an average.
There are 4 values that are considered measures of
the center.
• Mean
• Median
• Mode
• Midrange
ARRAYS
• Mean – the arithmetic average with which you are
the most familiar.

• Mean: sum of all x


x  bar 
number of x

x
x
n
SAMPLE AND POPULATION SYMBOLS

• As we progress in this course there will be


different symbols that represent the same
thing.
• The only difference is that one comes from a
sample and one comes from a population.
x
• Sample Mean:

• Population Mean:
EXAMPLE

Find the mean of the array.

4, 3, 8, 9, 1, 7, 12

 x 4  3  8  9  1  7  12 44
x    6.29  6.3
n 7 7
EXAMPLE…….
Find the mean of the following
numbers.
23, 25, 26, 29, 39, 42, 50

 x 23  25  26  29  39  42  50
x 
n 7
234
x  33.4
7
MEDIAN

Median – the middle number in an


ordered set of numbers. Divides the data
into two equal parts.
Odd # in set: falls exactly on the middle
number.
Even # in set: falls in between the two
middle values in the set; find the average of
the two middle values.
EXAMPLE

• Find the median.

• A. 2, 3, 4, 7, 8 - the median is 4.

• B. 6, 7, 8, 9, 9, 10
median = (8+9)/2 = 8.5.
MODE

The number that occurs most often.

Suggestion: Sort the numbers to make it easier to


see the grouping of the numbers.

You can have a single number for the mode, no


mode, or more than one number.
EXAMPLE

• Find the mode.


• 1, 2, 1, 2, 2, 2, 1, 3, 3
• Arrange the numbers in and
sort to see the groupings
easier.
EX 2

• Find the mode.

• A. 0, 1, 2, 3, 4 - no mode

• B. 4, 4, 6, 7, 8, 9, 6, 9 - 4 ,6,
and 9
MIDRANGE
• The number exactly midway
between the lowest value
and highest value of the
data set.
• It is found by averaging the low
and high numbers.
( Low value  High Value )
midrange 
2
EXAMPLE

• Find the midrange of the set.


• 3, 3, 5, 6, 8

(3  8) 11
midrange    5.5
2 2
WEIGHTED AVERAGE
WEIGHTED AVERAGE.

• Sometimes we wish to average


numbers, but we want to assign
more importance, or weight, to
some of the numbers.
• The average you need is the
weighted average.
FORMULA FOR WEIGHTED AVERAGE

 xw
Weighted Average 
w
where x is a data value and w is
the weight assigned to that data
value. The sum is taken over all
data values.
EXAMPLE:
Suppose your midterm Exam score is 83 and your final
exam score is 95. Using weights of 40% for the midterm
and 60% for the final exam, compute the weighted average
of your scores. If the minimum average for an A is 90, will
you earn an A?

Weighted Average 
 83  0.40   95  0.60  You will earn
0.40  0.60 an A!
32  57
  90.2
1
MEASURES OF
DISPERSION…..ARRAYS
DISPERSION
• In statistics, dispersion is the extent to which
a distribution is stretched or squeezed.
• Common examples of measures of statistical
dispersion are the variance, standard
deviation, and interquartile range.
• The measure of the spread or variability

• No Variability – No Dispersion
MEASURES OF VARIATION

There are 3 values used to measure

the amount of dispersion or variation.

(The spread of the group)


1. Range
2. Variance
3. Standard Deviation
WHY IS IT IMPORTANT?
• You want to choose the best brand of paint for
your house.
• You are interested in how long the paint lasts
before it fades and you must repaint.
• The choices are narrowed down to 2 different
paints.
• The results are shown in the chart. Which
paint would you choose?
The chart Paint A Paint B

indicates 10 35

the number 60 45

of months a 50 30

paint lasts 30 35

before 40 40

20 25
fading.
210 210
DOES THE AVERAGE HELP?

• Paint A: Avg = 210/6 = 35 months

• Paint B: Avg = 210/6 = 35 months

• They both last 35 months before


fading. No help in deciding which to
buy.
CONSIDER THE SPREAD

• Paint A: Spread = 60 – 10 = 50 months

• Paint B: Spread = 45 – 25 = 20 months

• Paint B has a smaller variance which


means that it performs more consistently.
• Choose paint B.
RANGE

• The range is the difference


between the lowest value
in the set and the highest
value in the set.

• Range = High # - Low #


EXAMPLE

• Find the range of the data


set.

• 40, 30, 15, 2, 100, 37, 24, 99

• Range = 100 – 2 = 98
DEVIATION FROM THE MEAN
• A deviation from the mean, x – x bar, is
the difference between the value of x
and the mean x bar.

We base our formulas for variance and


standard deviation on the amount that
they deviate from the mean.
VARIANCE

• Variance Formula

( x) 2
x  2

s 
2 n
n 1
STANDARD DEVIATION
• The standard deviation is the
square root of the variance.

s  s 2
EXAMPLE – USING FORMULA

• Find the variance.


6, 3, 8, 5, 3
x x 2

6 36
3 9
8 64
5 25
3 9
 x  25  x 2
 143
( x) 2
x 2

s2  n
n 1

25 2
143 
5 143  125 18
s 
2
   4.5
4 4 4
FIND THE STANDARD DEVIATION
• The standard deviation is the
square root of the variance.

s  4.5  2.12
VARIANCE – USING FORMULA

• Square the ENTIRE number for


the standard deviation not the
rounded version you gave for
your answer.

s  (2.121320344)  4.5
2 2
INTERPRETING
TRIAL RESULTS
Trial Size
 One statistical calculation that occurs before a trial begins is the
sample size or the number of volunteers that need to be
enrolled.
 If the overall incidence in the trial population is low, more
volunteers are necessary. Therefore need more volunteers if
recruiting from general population versus high-risk populations
 Some trials are also designed to continue until a pre-determined
number of HIV infections or endpoints occur: if the HIV incidence
is low, the trial duration is longer.
 The precision with which the efficacy of the vaccine is
determined is based on the number of HIV infections that occur
during the study, not the total number of volunteers involved.
Statistical Variation
 Statistical variation is the technical term for chance fluctuations.
 Even if disease rates in two populations (say 1 million people
each) are identical, they may not appear so in a study.
 If we study two identical populations with identical risk factors
and exposures, and pick a sample of 1,000 from each population,
it likely will turn out that the disease rates measured will be
similar but not identical.
 As we increase the sample (to 10,000 subjects each), our study-
based estimates of the true disease rates in the entire
populations will be more accurate.
Statistical Power
 "Statistical power" measures the ability of a study to find
an association between exposure and disease, when
such an association actually exists.
 If an intervention does indeed decrease the risk of
disease, then a study with high power will be very likely
to find an association.
 However, if the study has low power, then it has little
chance of finding an association even if there is an actual
association between the intervention and the disease.
 Depends on SAMPLE SIZE
Statistical Significance

 Did the vaccine actually work or did the results


happen merely by chance?
 The smaller the p-value and the closer the confidence
interval, the better the chance that less HIV infections
in the vaccine arm actually means something.
 P-values and confidence intervals are based on the
same underlying concepts of probability.
PRINCIPLES OF PROBABILITY

50 heads : 50 tails

95% confidence
interval
2 heads : 98 tails 98 heads : 2 tails

Data
The effect of sample size

If you toss the


coin 10,000
times
95% CI

If you toss the


coin 10 times
95% CI
Efficacy
 Efficacy: compare the number of HIV infections that
occurred in the vaccine and placebo groups.
 If more infections occur in volunteers who received
placebo (e.g., Thai vaccine trial), researchers can then
estimate the efficacy of the vaccine candidates.
 Thai (AIDSVAX/ALVAC) trial:
 74 infections occurred among volunteers in the placebo

group
 51 in those who received the full prime-boost regimen

Therefore the efficacy of the vaccine candidates was


31.2%
Confidence Intervals
 To account for the possibility that the number of HIV
infections were higher in the placebo arm just by chance,
epidemiologists use "confidence intervals."
 CI for the actual efficacy of the vaccine. = range of values
around the best estimate of efficacy, all of which are
contenders
 A "95% confidence interval" means that there is a 5%
chance that even this broad range is due to chance.
In other words, there is still a 5% chance that the true
value of the measurement lies outside of the interval.
Thai Vaccine Trial
Results
EFFICACY point
estimate = 31.2%

52.1%
1.2%

95% CI = 1.2-52.1%
efficacy
P-Value

 P-value (probability value) quantifies


uncertainty about whether an outcome is due
to chance or whether it actually reflects a true
difference.
 P-value is the percent likelihood of a chance
outcome, thus the lower the p-value, the
better confidence you can have in the results
of the study.
 Traditionally (and arbitrarily), a p-value of .05
or less has been accepted as evidence of
actual difference.
Interpreting the P-value
Common misunderstandings about p-values:
The p-value is not the probability that the vaccine has
no effect.
The p-value is not the probability that a finding is
"merely a fluke.“
The p-value is not the probability of falsely finding the
vaccine to be effective.
The p-value is not the probability that a replicating
experiment would not yield the same conclusion.

Rather, the p-value is the chance of obtaining


such results if the vaccine actually had no effect.
Let’s Practice Interpreting the Results from the Thai HIV
Vaccine Trial
31% efficacy (CI = 1.2%-52.1%, P=0.04)
 “The vaccine recipients had a 31% lower risk of HIV infection than
those who received placebo.”
 “The efficacy of the prime-boost regimen could be anywhere in
the range of 1.2% to 52.1%, yet the most likely efficacy is at the
middle of that range, or 31.2%.”
 “If the vaccine had no effect whatsoever, there is a 4% chance
that this split in infections, or an even larger one, would have
occurred anyway.”
Odds Ratio
 used to assess the risk of a particular outcome (or disease) if a
certain factor (or exposure) is present.
 a relative measure of risk, telling us how much more likely it is
that someone who is exposed to the factor under study will
develop the outcome as compared to someone who is not
exposed.
 to calculate the OR, we calculate the odds of exposure
among cases and divide it by the odds of exposure among
controls.
 OR > 1 means outcome is more likely
 OR < 1 means outcome is less likely

You might also like