You are on page 1of 4

BIOSTAT WEEK 2

MEASURES OF CENTRAL TENDECY

 a single value that attempts to describe a set of data by


 Median - the middle score for a set of data that has
identifying the central position within that set of data.
been arranged in order of magnitude.
 measures of central tendency are sometimes called
- less affected by outliers and skewed data.
measures of central location.
- Represents the center of the continuous variable by
 Ex. Mean, Median and Mode – valid measures of MCT
sorting all the possible values and dividing those
values in half.
 Mean (Average) - the most popular and well-known
- Value of the continuous variable at the halfway, 50%
measure of central tendency.
of the values are smaller than the median, and 50%
- Represent the center of continuous variable by
are larger.
summing all the values and dividing by the number of
Population median – represented by M
subjects that contributed the values.
- Estimated by sample median
- Can be used w/ both discrete & continuous data.
Sample median – represented by m
- Important property – it includes every value in your
- The middle value of the sorted observed value
data set as part of the calculation.
Example:
Formula:
65 55 89 56 35 14 56 55 87 45 92
Population Mean Sample mean 14 35 45 55 55 56 56 65 87 89 92
Represented by MU ( μ ¿ Represented by xbar ( x ¿
Using a sample, the It is a statistic and it is the 65 55 89 56 35 14 56 55 87 45 92 60
population MU is sum of the values that is 14 35 45 55 55 56 60 56 65 87 89 92
estimated by the sample observed in a sample
mean divided by the subjects
(x 1 + x 2+ …+ x n) Note: Numerical summary is considered robust if it is
μ=
∑x x=
n
not easily pulled toward on tail. Extreme values are
n less likely to affect them.

∑ (sigma) = sum of x=
∑x Note: The median provides robust numerical
n summary of the center. The median is a more robust
measure of the center because it is at the middle of
Example: Find the sample mean: the value. Even if the tail contains extreme values,
Scores in BIOE211 Quizzes from Quiz 1 – 5. the middle value is the less likely to be pulled towards
the extreme value
QUIZ # 1 2 3 4 5
SCORE 18 12 15 17 16 Note: When the data are not heavily skewed the
mean and median will be fairly close to each other

 Mode - the most frequent score in our data set.


- used for categorical data where we wish to know
 When not to use mean?
which is the most common category
One main disadvantage: it is particularly
- Represents the most common response
susceptible to the influence of outliers – values
thar are unusual compared to other data by being
especially small or large in number

For example, consider the wages of staff at a


factory below:

 Unimodal Distribution – it has one mode or


one most common value
 Bimodal Distribution – it has two picks
BIOSTAT WEEK 2
Note: uni, bi or the mode is represented by histogram - a statistics that is represented by s, and it is used to
estimate the population SD
- Often reported in lieu of the variance because the
units of SD are the same as the units of the mean
 Problems when using mode Note: If the mean is chosen to describe the center, the
1. When we have two or more values that SD or variance is the appropriate measure to describe
share the highest frequency the spread

 What type of data should you use when you


calculate a standard deviation?
- used in conjunction with the mean to summarize
continuous data.
- appropriate only when the continuous data is
not significantly skewed or has outliers.
2. When the most common mark is far away
from the rest of the data in the data set. FORMULA
sample standard
deviation – square

s= ∑ ¿¿ ¿ ¿
root of sample
variance
population standard

σ= ∑ ¿¿¿¿
deviation
Total sum of squares

WHEN TO USE THE MEAN, MEDIAN AND MODE  Variance - Another method for calculating the
Type of Variable Measure of central tendency deviation of a group of scores from the mean
Nominal Mode - Average squared distance that each observation is
Ordinal Median
from the mean.
Interval/Ratio (not skewed) Mean
Interval/Ratio (skewed) Median - Achieves positive values by squaring each of the
deviations, adding up this square deviation gives us
the sum of three squares which we can then divide by
MEASURES OF DISPERSION
the total number of scores in our group of data
- also called as Measures of spread  Sample Variance – it is a statistic represented
- used to describe the variability in a sample or population. by s squared
- used in conjunction with a measure of central tendency. - Used to estimate the population variance
- In continuous data that simply describing the center of  Population Variance
distribution, it is not sufficient there’s this measure of
spread that we’ll need
Why is it important to measure the spread of data?
1. Because of their relationship in measures of
central tendency
2. Gives us an idea of how well the mean represents
 Range - the difference between the largest score in
the data
the set of data and the smallest score in the set of
 Standard Deviation - measure of the spread of scores
data.
within a set of data.
- It truly represents the range of values in distribution
- The standard deviation measures how concentrated
Formula: X L −X S
the data are around the mean.
- The square root of the variance Example: What is the range of the following data:
- it is always positive 4 8 1 6 6 2 9 3 6 9
BIOSTAT WEEK 2
The largest score (X ¿ ¿ L)¿ is 9; the smallest score (
X S) is 1; then the range is X L −X S= 9 - 1 = 8
 Interquartile Range – condensed version of
range, it is difference between the first and
third quartile, or the difference between the
NORMALITY TESTING
values of the 25th and 75th percentiles
- Represents the range of values that belong SYMMETRY – A distribution is said to be symmetric about
to the middle or 50% of the subjects the mean, if the distribution to the left of mean is the
 When to use range? “mirror image” of the distribution to the right of the mean.
The range is used when: Likewise, a symmetric distribution has SK=0 since its mean
- you have ordinal data its equal to its median and its mode.
- you are presenting your results to people with
MEASURES OF SHAPE
little or no knowledge of statistics
The range is rarely used in scientific work as it is  Skewness
fairly insensitive - Absence of symmetry
- It depends on only two scores in the set of data, - Extreme values in one side of a distribution
X L and X S
- Two very different sets of data can have the same
range: 1 1 1 1 9 vs 1 3 5 7 9

COEFFICIENT OF VARIATION

- Measure of Relative Variation


- End result should be always a %
- Shows Variation Relative to Mean
- Used to Compare 2 or More Groups
Formula ( for Sample):

 Highly Skewed – Skewness is <(-1) or >1


 Moderately Skewed - skewness is between -1
to -.5 or .5 to 1
 Approximately symmetric - skewness is
between -.5 to .5
 Kurtosis
- Peakedness or flatness of a distribution
 Leptokurtic: high and thin, K > 0
- When we have Positive Kurtosis – implies
distribution with more extreme possible data
values or outliers than a normal distribution,
thus resulting to fatter tail which is leptokurtic
- A distribution with kurtosis >3 (excess
kurtosis >0 (positive))
- Compared to a normal distribution, its tails
are longer and fatter, and often its central
peak is higher and sharper

 Mesokurtic: normal in shape, K=0


BIOSTAT WEEK 2
- When a distribution with zero kurtosis, or
have a roughly the same outlier character as a
normal distribution
- A normal distribution has kurtosis exactly 3
(excess kurtosis exactly 0). Any distribution
with kurtosis ≈3 (excess ≈0)
 Platykurtic: flat and spread out, K < 0 FINDING QUARTILES, OR DECILES, OR PERCENTILES
- When we have negative kurtosis which
implies a distribution with less extreme  To find the quartiles, or deciles, or percentiles we
possible data values or outliers than a normal follow the same procedure to find the median.
distribution, thus resulting to thinner tails  Arrange the data in ascending form.
- A distribution with kurtosis <3 (excess
kurtosis <0 (negative).
- Compared to a normal distribution, its tails
are shorter and thinner, and often its central
peak is lower and broader.

MEASURES OF LOCATION

A. Percentiles - Numerical measures that give the relative


position of a data value relative to the entire data set.
- Divide an array (raw data arranged in increasing or
decreasing order of magnitude) into 100 equal parts.
- The kth percentile, denoted as Pk , is the data value
in the data set that separates the bottom k% of the
data from the top (100-k)%
- Always on 100 equal parts
Example:
Suppose LJ was told that relative to the other scores on
a certain test, his score was the 95th percentile.
This means that 95% of those who took the test had
scores less than or equal to LJ’s score, while 5% had
scores higher than LJ’s
B. Deciles - Divide an array into ten equal parts, each part
having ten percent of the distribution of the data
values, denoted by Dk
- The 1st decile is the 10th percentile; the 2nd decile is
the 20th percentile and so on…..
C. Quartiles - Divide an array into four equal parts, each
part having 25% of the distribution of the data values,
denoted by Q k
- The 1st quartile is the 25th percentile, the 2nd
quartile is the 50th percentile, also the median and the
3rd quartile is the 75th percentile

You might also like