You are on page 1of 4

Introduction to Mathematics and Statistics B

Notes A3
Greek letters  Population
Latin letter  Sample
n

∑ x i is a way of expressing sum of x


i=1

Arithmetic mean The average – primary measure of central Sample mean −¿


location. Population mean ʮ
- Sensitive to outliers

Median Measure of central location Average the two middle values if


- Not sensitive to outliers there is an even number of values
- Middle value of odd number a+b
=median
- Average of middle values if even 2
number
Mode Most frequently occurring value
- Minimally sensitive to outliers
- Used to summarize qualitative data
- No mode, one mode (unimodal) or many
modes (multimodal)
Weighted mean Mean that is adjusted to the probability
X =
∑ of weighted terms
- Sensitive to outliers number of terms
Percentile Percentile divides data into two parts 1. Arrange in ascending order
- 25th percentile - first quartile p
2. L p= (n+1)
- 50 percentile - second quartile (middle)
th
100
- 75th percentile – third quartile 3. The location will reveal the
percentile value
4. If it is a decimal, find the
average between the two
locations

- IQR=Q3 −Q1
Outliers in - Calculate IQR
percentiles - 1.5 × IQR
- If the value is less than or
more than 1.5 × IQR then it
is an outlier
Box and whisker plot:

Measures of Dispersion
Range - Difference between maximum and
minimum values
- Simplest measure of spread
- Relies on the most extreme outliers
Mean absolute deviation (MAD) - Measures the average absolute
difference from the mean
-
|x 1−X|+|x 2−X|…
n
Variance Standard deviation
Sample variance and population variance Sample SD or population SD
x − X )2 + ( x 2−X )2 … σ =√ σ
2
2 ( 1
σ =
n−1
Square root of the variance value
Measures the average squared difference
from the mean
Coefficient of variation (CV) - Adjusts for differences between the
magnitudes of the means
- A unit-less measure of mean-
adjusted dispersion
- Allows easy comparison across data
σ
- CV =
X
The z-score:
- Number of standard deviations away from the mean
1. Equal to the mean  0 z-score
2. Less than the mean  negative z-score value
3. More than the mean  positive z-score value
Converting values into z-score is called the standardization of data.
Standardizing a value:
z x i −X
i=
σ

value−mean
z value=
standard deviation
Converting z score back to original value:
x i=X + z i σ

original value=mean+ [ ( z score ) × ( standard deivation ) ]

Chebyshev’s theorem:
1. If we know the standard deviation and the mean – we know 75% of the data
2. Empirical rule – for many datasets, we can say what fraction of observations fall
within 1,2 and 3 SDs from either side.
For any data, the proportion of observations that lie within k standard deviations from the
1−1
mean is at least 2 where k > 1
k
Analysis of relative location:
Theorem:
1. Applies to all datasets, regardless of their distributions
2. Defines a lower bound on the percentages of observations lying in a given interval
3. Actual percentages can be much greater

Empirical rule:
1. More precise
2. Only applies for symmetrical, bell-shaped distributions

Covariance and correlation


- Covariance ( s xy∨σ xy ¿ – describes the direction of the linear relationship between two
variables, x and y.

σ xy =
∑ [( xi −X ) ( y i−Y ) ¿ ] ¿
n−1
- Correlation (r xy ∨ρ xy ¿ – describes both the strength and direction of the relationship
between two variables, x and y.
σ xy
ρ xy =
σxσ y

Note that −1 ≤ ρ ≤ 1

You might also like