DATA EXPLORATION-Measures of Central Tendency

Computing and interpreting
measures of central tendency

Measures of Central Tendency
Researchers are often interested in defining a value that
best describes some attribute of the population. Often
this attribute is a measure of central tendency or a
proportion. The three most commonly used measures of
central tendency are mode, median and mean.
Mode
• The mode is the most frequently appearing value in the
population or sample.
• It is the value with a highest frequency.
• Example: consider five women having the following

weights; 100 kg, 100 kg, 130 kg, 140 kg, and 150 kg.
• The value with the highest frequency is 100kg
• Thus the mode would equal 100 kg.

Median
• The median is a balance point since it splits the data into

two piles each containing half the values.
• To find the median, we arrange the observations in
order from smallest to largest value.
• If there is an odd number of observations, the median is
the middle value.
• If there is an even number of observations, the median is
the average of the two middle values.
• Thus, in the sample of five women, the median value
would be 130 kg; since 130 kg is the middle weight.
Mean
• The sample mean is perhaps the most important of the
three measures
• It represents the balance point (or centre of gravity) of a
distribution
• The mean of a sample or a population is computed by
adding all of the observations and dividing by the
number of observations.
• Returning to the example of the five women, the mean
weight would equal (100 + 100 + 130 + 140 + 150)/5 =
620/5 = 124 kg.
Measures of Variability
• Measures of dispersion measure how spread out a set
of data is.
• They are important for describing the spread of the data,

or its amount of variation around a central value
• For example, consider a population of four random

variables {5, 5 ,5, 5}. Here, each of the random variables
are equal, so there is no variation. The set {3, 5, 5, 7}, on
the other hand, has some variation since some random
variables are different.
• The three parameters that are used to quantify the

amount of variation in a set of random variables are the
range, the variance, and the standard deviation.
The range
• The range is the simplest measure of variation.
• Defined as the difference between the largest and

smallest sample values
• Range = Maximum value - Minimum value
• Therefore, the range of the four random variables (3, 5, 5,

7} would be 7 - 3 or 4.
• As demonstrated, depends only on extreme values and

provides no information about how the remaining data is
distributed.
Standard deviation and variance
Computing standard deviation and variance 1
• First find the mean and the

deviations about the mean.
x X-μ (X-μ)2
• Add up these deviations and
find out how far on average the
1 1-3= -2 4
scores deviate from the mean.
2 2-3= -1 1
3 3-3= 0 0
• Whenever the deviations are
added (in order to find the 4 4-3= 1 1
average of the deviations) they 5 5-3= 2 4

will always sum to zero (Refer μ=3 Σ(X-μ)= 0 Σ(X-μ)2 =10
Table).
• To avoid this, the squared

deviations are added to get a
measure of overall variability in
the distribution.
Computing standard deviation and variance 2
• The sum of the average squared deviations is called the

sums of squares
• Divide the squared sums by the number of observation

to get the average squared deviation or variance. In this
example it is 10/5 = 2.
• Root of the variance in order to get the standard

deviation.
• The standard deviation is the average deviation about

the mean. For our example we take the square root of 2
and find1.41 is the standard deviation
  2
Population variance and standard
deviation computational formulae
• σ2 = Σ ( Xi - μ )2 / N
  2
• Population standard deviation is given by Root σ2= σ
► where σ2 is the population variance,

► μ is the population mean,
► Xi is the ith element from the population,
► and N is the number of elements in the population.
• Thus by definition, the variance of a random variable is

the average squared deviation from the population mean
Sample variance and standard deviation
computational formulae
• The variance of a sample is defined by a slightly

different formula, the numerator is divide by n – 1 instead
of N
• s2 = Σ ( xi - x )2 / ( n - 1 )
►where s2 is the sample variance,
► x is the sample mean,
► xi is the ith element from the sample,
► and n is the number of elements in the sample.
Standard deviation s s 2
Measures of position 1
• Measures of position tell where a specific data value
falls within the data set or its relative position in
comparison with other data values.
• The most common measures of position are

percentiles, deciles, and quartiles.
• Quartiles, deciles and percentiles are just a

generalization of the median.
• Median is the middle value of the data set (assumed to

be arranged in ascending order).
• In a similar way we define the quartiles as the quarter
values of the data set, deciles as the one-tenth values of
the data set and so on.
• Quartiles, deciles and percentiles (unlike the median

which acts as a measure of central tendency) give us an
idea about the skewness of the data set.
Skewness
• Standard Scores
– A standard score or z score is used when direct
comparison of raw scores is impossible.
– A standard score or z score for a value is obtained by
subtracting the mean from the value and dividing the
result by the standard deviation.
• Percentiles
– Percentiles are position measures used in educational
and health-related fields to indicate the position of an
individual in a group.
– A percentile, P, is an integer between 1 and 99 such
that the Pth percentile is a value where P % of the
data values are less than or equal to the value and
100 – P % of the data values are greater than or
equal to the value.
Quartiles
• Data (or the distribution) can be divided into FOUR parts
and the cut points are called QUARTILES denoted by
Q1, Q2, Q3.
► Q1 is the same as the 25th percentile;
► Q2 is the same as the 50th percentile or the median;
► Q3 corresponds to the 75th percentile.
For example, consider the following 15 numbers
3 6 7 11 13 22 30 40 44 50 52 61 68 80 94
Q1 Q2 Q3
Quartiles
• The first quartile is Q1=11 The second quartile is

Q2=40 (This is also the Median.) The third quartile is
Q3=61
• (To within 1 datum) One quarter of the data are below

Q1, two quarters below Q2, three quarters below Q3
• One quarter of the data (about) is between Q1 and Q2,

etc.
Deciles
• In a Data Set deciles are the 9 values that divide the sorted data
into 10 equal parts/groups.
• Accordingly they are called the 1st, 2nd... 9th deciles (also
denoted as D1,D2 ... D9). If X1, X2, X3, .., Xn are the observed
values and are assumed to be arranged in ascending (or
descending order) then the corresponding definition are as
follows:
D1 10th percentile =X(10)

D5 50th percentile=X(50)
Deciles
• Let us consider the following example.
• Example 1: Data Set: 1, 2, 4, 6, 7, 9, 10, 12, 14

• Deciles:
D1 = 1, D2 = 2, D3 = 4,
D4 = 6, D5 = 7, D6 = 9,
D7 = 10, D8 = 12, D9 = 14
Outliers
– An outlier is an extremely high or an extremely
low data value when compared with the rest of
the data values.
– Outliers can be the result of measurement or
observational error.
– When a distribution is normal or bell-shaped,
data values that are beyond three standard
deviations of the mean can be considered
suspected outliers.

DATA EXPLORATION-Measures of Central Tendency

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DATA EXPLORATION-Measures of Central Tendency

Uploaded by

Copyright:

Available Formats

Computing and interpreting

measures of central tendency

• It is the value with a highest frequency.

• Example: consider five women having the following

• The value with the highest frequency is 100kg

• Thus the mode would equal 100 kg.

• The median is a balance point since it splits the data into

• They are important for describing the spread of the data,

• For example, consider a population of four random

• The three parameters that are used to quantify the

• Defined as the difference between the largest and

• Range = Maximum value - Minimum value

• Therefore, the range of the four random variables (3, 5, 5,

• As demonstrated, depends only on extreme values and

• First find the mean and the

average of the deviations) they 5 5-3= 2 4

• To avoid this, the squared

• The sum of the average squared deviations is called the

• Divide the squared sums by the number of observation

• Root of the variance in order to get the standard

• The standard deviation is the average deviation about

► where σ2 is the population variance,

• Thus by definition, the variance of a random variable is

• The variance of a sample is defined by a slightly

• The most common measures of position are

• Quartiles, deciles and percentiles are just a

• Median is the middle value of the data set (assumed to

• Quartiles, deciles and percentiles (unlike the median

For example, consider the following 15 numbers

• The first quartile is Q1=11 The second quartile is

• (To within 1 datum) One quarter of the data are below

• One quarter of the data (about) is between Q1 and Q2,

D1 10th percentile =X(10)

You might also like