You are on page 1of 51

Lecture 2

The Normal Distribution &


Descriptive Statistics
BPK 304W
Fall 2022
Reminders and
announcements
• Labs
– Data
– Instructions
– How to video
– Result sheet (optional, you will input these answers to the
Canvas assignment quiz)
– Canvas Assignment quiz
• Quiz (not for credit)
Outline
• Normal Distribution
• Testing Normality with Skewness & Kurtosis
• Measures of Central Tendency
• Measures of Variability
• Z-Scores
• Percentiles
Statistics
The science of collecting,
organizing, analyzing, and
interpreting data
Why have descriptive
statistics?
• Need for population census for taxes, health,
birth, marriage, etc.
• Statistics comes from the Latin word
‘statisticus’ meaning state’s affairs
• Need to determine weather
• Predictions
• Makes science more objective
Definitions
• A statistic is any quantity that we can
calculate from data.
• Population parameters are quantities
that summarize population data.

Statistics are estimates of population


parameters. The quality of estimation
depends on how fairly the sample
represents the population.
We need data and we need to summarize it
Binary and Categorical Numerical Data
Data

We can summarize with graphs


Classification: Nominal, Ordinal,
Interval, Ratio
Data Element

Nominal: Ordinal: A scale of Interval: The interval Ratio: The ratio level
A scale of measurement where level of measurement of measurement has
measurement levels vary in order of has the characteristics characteristics of
where levels magnitude but equal of distinct levels, distinct levels, ordering
are distinct but intervals between ordering in magnitude, in magnitude, equal
do not vary in levels cannot be and equal intervals. intervals, and an
magnitude. assumed. Equal intervals are absolute zero.
obtained if equivalent A measurement has an
differences between absolute zero when a
measurements measurement of zero
represent the same represents the
amount of difference in absence of the
the property being property being
measured. measured.
Examples
Nominal County where you live
Race/ethnicity
Favorite flavor of ice cream
Myers-Briggs Personality Type
Ordinal Favorite size of coffee you order from
Birth order
Interval IQ
Score on Beck’s Depression inventory
Ratio Number of computers in a household
Temperature in Kelvin
Frequency Distribution
Frequency Distribution (aka ‘histogram”) of hypothetical grades from a
second-year chemistry class (n=144)
Frequency Normal Frequency Distribution

Standard Deviations
Skewness & Kurtosis
• Deviations in shape from the Normal distribution.
• Skewness is a measure of symmetry, or more
accurately, lack of symmetry.
– A distribution, or data set, is symmetric if it looks the same to
the left and right of the center point; it is skewed if it looks non-
symmetric to the left and right of the center point.

• Kurtosis is a measure of peakedness.


– A distribution with high kurtosis has a distinct peak near the
mean, declines rather rapidly, and has heavy tails.
– A distribution with low kurtosis has a flat top near the mean
rather than a sharp peak. A uniform distribution would be the
extreme case.
Skewness - Measure of Symmetry

Negatively skewed - left Normal Positively skewed -right

Many variables in BPK are positively skewed.


Can you think of examples?
Examples of skewness
• Basketball scores (-)
• Long jump record (-)
• Exam results (difficult, +; easy, -)
• Average income distribution (+)
• Human life cycle (-)
• Taxation (+ or -)
• Real Estate prices (+ or -)
• Retirement age (-)
Skewness
Kurtosis - Measure of Peakedness

(Normal)
Coefficient of Skewness

Where: X = mean, Xi = X value from individual i,


N = sample size, s = standard deviation

A perfectly Normal distribution has Skewness = 0

If -1 ≤ Skewness ≤ +1, then data are considered to be Normally distributed

For skewness, if the value is greater than + 1.0, the distribution is right
skewed. If the value is less than -1.0, the distribution is left skewed.

* The values for interpreting as normally distributed may vary. For the scope of this
course, please refer to these values.
Coefficient of Kurtosis

Where: X = mean, Xi = X value from individual I, N = sample size,


s = standard deviation

A perfectly Normal distribution has Kurtosis = 3 based on the above equation.

However, SPSS and other statistical software packages subtract 3 from raw kurtosis values.
Therefore, a kurtosis value of 0 from SPSS indicates a perfectly Normal distribution.

For kurtosis, if the value is greater than + 1.0, the distribution is leptokurtic. If the value is less
than -1.0, the distribution is platykurtic.
If -1 ≤ kurtosis ≤ +1, then data are considered to be normally distributed

* The values for interpreting as normally distributed may vary. For the scope of
this course, please refer to these values.
Normal Frequency
Distribution
• 68.26% of the data is within +/- 1
standard deviation (SD)

Cumulative Frequency Distribution


(CFD)
100 • The CFD shows the proportion (or
Frequency (%)

percentage) of measurements
falling below a certain point.
50 • Keeping track cumulatively of area
under curve.
• For the standard normal
0
distribution, 50% of values/scores
fall at or below the mean value of 0.
Normal Probability Plots
Correlation between observed and expected cumulative
probability is a measure of the deviation from normality.
Expected cumulative prob

Expected cumulative prob


Observed cumulative prob Observed cumulative prob

Normal P-P plot of height Normal P-P plot of weight


Descriptive Statistics
Measures of Central Tendency
– Mean, Median, Mode

Measures of Variability (Precision)


– Variance, Standard Deviation,
Interquartile Range

Standardized scores
– comparisons to a reference distribution

Percentiles
Measures of Central Tendency
Mean: “centre of gravity” of a distribution; the
“weight” of the values above the mean exactly
balance the “weight” of the values below it.
Arithmetic average.
Median (50th %tile): the value that divides the
distribution into the lower and upper 50% of the
values
Mode: the value that occurs most frequently in
the distribution
These measures
are useful
because most
values are near
the middle of a
normal
distribution
Measures of Central Tendency
• When do you use mean, median, or mode?
– Height Mean, if normally distributed
– House prices in Vancouver +ve skewed -> median

• Study design: how many repeat measurements do


you take on individuals to determine their true
(criterion) score?
• Discipline specific
• Research design specific
• Objective vs. subjective tests
Measures of Variability

• Variance
• Standard Deviation (SD) = Variance1/2
• Range is approximately = ±3 SDs

For Normal distributions, report the mean and SD


For non-Normal distributions, report the median
(50th %tile) and interquartile range (IQR, 25th and
75th %tiles)
Standard Deviation
- a quantity calculated to indicate the extent of
deviation for a group as a whole

- quantify the amount of variation or dispersion of


a set of data values

- a measure of how spread out numbers are

- deviation just means how far from the normal

- tells us how far a set of numbers lie apart


Normal Frequency Distribution
Normal Frequency Distribution
Central Limit Theorem
• If a sufficiently large number of random
samples of the same size were drawn from
an infinitely large population, and the mean
was computed for each sample, the
distribution formed by these means would be
normal.

Distribution of multiple sample means.


Distribution of a single sample
Standard deviation of sample means is called
the standard error of the mean (SEM).
Central Limit Theorem

The means of all the samples will approximate a


normal distribution
Standard Error of the Mean
(SEM)

• The SEM describes how confident you are that the


mean of the sample is the mean of the population

• We can calculate (estimate) the SEM for a given set


of data from the SD and sample size. How does the
SEM change as the size of your sample increases?
Sample
distribution:
skewed

Distribution of
means: note the x
axis change

Increasing N and
plotting the means
of a larger N
results in a normal
distribution
Standardizing Scores
Transform data into standard scores
(e.g., Z-scores)
Eliminates units of measurements
Height (cm) Z-Score of Height

Mean=161.0; SD=6.2; N=5782 Mean=0.0; SD=1.0; N=5782


Standardizing Data
Standardizing does not change the distribution
of the data
Weight (kg) Z-Score of Weight
Z- Scores
x-µ
Z=
Score = 24
s
Mean of Norm = 30
SD of Norm = 4
Z-score = ??
Units = ??
Z-scores

Another useful transformation in statistics is


standardisation. Sometimes called "converting to Z-
scores" or "taking Z-scores" it has the effect of
transforming the original distribution to one in which the
mean becomes zero and the standard deviation
becomes 1. A Z-score quantifies the original score in
terms of the number of standard deviations that that
score is from the mean of the distribution. The formula
for converting from an original or "raw" score to a Z-
score is:
Internal or External Norm for Calculating
Z-Scores
Internal Norm
A sample of subjects is measured. Z-scores are calculated based
upon the mean and SD of the sample. Thus, Z-scores using an
internal norm tell you how good each individual is compared to the
group they come from. Mean = 0, SD = 1

External Norm
A sample of subjects are measured. Z-scores are calculated based
upon mean and SD of an external normative sample (national, sport-
specific etc.). Thus, Z-scores using an external norm tell you how
good each individual is compared to the external group. Mean = ?,
SD = ? (depends upon the external norm)
E.g., You compare aerobic capacity of middle aged men to an external
norm and get a lot of negative z-scores? What does that mean?
Z-scores allow measurements from tests
with different units to be combined. But
beware: higher Z-scores are not
necessarily better performances.
z-scores for
z-scores for
profile B
Variable profile A
Sum of 5 Skinfolds (mm) 1.5 -1.5*
Grip Strength (kg) 0.9 0.9
Vertical Jump (cm) -0.8 -0.8
Shuttle Run (sec) 1.2 -1.2*
Overall Rating 0.7 -0.65
*Z-scores are reversed because lower skinfold and
shuttle run scores are regarded as better performances
Test Profile A Test Profile B
Clinical Example:
Osteoporosis BMD T-scores
To diagnose osteoporosis,
clinicians measure a
patient’s bone mineral
density (BMD)

They express the patient’s


BMD in terms of standard
deviations above or below
the mean BMD for a “young
normal” person of the same
sex and ethnicity
BMD T-scores and
Osteoporosis

Although physician’s call this standardized score


a T-score, it is really just a Z-score where the
reference mean and standard deviation come
from an external population (i.e., young normal
adults of a given sex and ethnicity).
Classification using BMD T-
scores
• Osteoporosis T-scores are used to classify a
patient’s BMD into one of three categories:
– T-scores of ³ -1.0 indicate normal bone density
– T-scores between -1.0 and -2.5 indicate low bone
mass (“osteopenia”)
– T-scores £-2.5 indicate osteoporosis
• Decisions to treat patients with osteoporosis
medication are based, in part, on T-scores.
• https://www.nof.org/patients/diagnosis-
information/bone-density-examtesting/
Percentiles
Percentile: The percentage of the
population that lies at or below that score
How to Read a Paper (Keshav)
• Three-pass approach
– 1st pass: gives you a general idea about the paper
– 2nd pass: lets you grasp the paper’s content, but not its
details
– 3rd pass: helps you understand the paper in depth
Peer-Reviewed Journal Article
Abstract
– A brief, structured summary of the study
Introduction
– Relevant background and study objectives/hypotheses
Methods
– Study population, experimental protocol, statistical analysis. Should
be detailed enough for another scientist to repeat the study.
Results (includes tables and figures)
– Clear description of results without any interpretation
Discussion
– Interpret results in the context of the hypothesis and existing
literature
– Comment on future research directions & limitations
Submission Format
– “Information for Authors” provided by individual journals
– Attention to detail is mandatory
http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWtoc.html
47
1st pass (5-10 min)
• Quick scan, bird’s eye view of paper
• Purpose: should I read it again/further?
• 1. Read title, abstract, introduction
• 2. Read section and sub-section headings, 1st
sentences of paragraphs, ignore everything else
• 3. Look at figures, captions, and tables for main
findings
• 4. Read the conclusion
• 5. Glance over references (recognize any?)
1st pass cont’d
• 5 C’s
– Category: what type of paper is this? (new methods, new
analysis, prototype?)
– Context: what have previous papers found? What is the
motivation?
– Correctness: does the study seem correct?
– Contributions: What is the main contribution of the work?
– Clarity: Is the paper well-written? Hard to read?
2nd pass (an hour or so)
• Purpose:
– Be able to grasp the content of the paper
– Be able to summarize main thrust of the paper to
someone else
– Level of detail is appropriate in which you are
interested, but not in your research specialty
• May not understand well after 2nd pass
• Can choose to:
– Set aside (may not read again)
– Set aside (reread later)
– Read for a third pass
How to Read a Scientific
Paper (Raff, 2013)

1. Read introduction, “skip abstract” (depends)


2. Identify the big picture question
3. Summarize background (5 statements)
4. Identify the specific question(s)
5. Identify approach
6. Methods – draw diagram
7. Read results section
8. Read conclusion/discussion
9. Read the abstract

You might also like