You are on page 1of 22

Quantitative Data Analysis

Lecture 3- Graphs & Variance


Benjamin R Cowan
Histograms

How many times certain


values occurs (frequency)

Looking for a normal


distribution
Dispersion of distribution
Range (max and min score)
Affected by extreme scores

Interquartile range
Median in second quartile-50% percentile (divides the data in half)
Lower quartile (25% percentile)=median of 1st half
Upper quartile (75% percentile)= median of 2nd half
Removes influence of extreme scores
Dispersion & Boxplots

Tells us about the distribution

Symmetrical or skewed

Lets us know of any outliers


Boxplots
Interquartile ranges (25% and 75%)

Median
Scatterplot
Graph relationships between
continuous data
Variance (s2)
Definition: The average squared deviation of our data from the mean
(average)

How to get it:


1) We sum the squared differences between the mean and each data point as
some will be negative

2) Then we divide by the number of participants – 1 (to estimate this for the
population)

This is also called the sum of squared errors


Standard Deviation (SD or s)
Problem: Variance is in squared units

It makes little sense to talk about things like Reaction Time squared

Solution: square root the variance to get standard deviation (SD or s)

Small standard deviations suggest data points are close to the mean
Population and Sample
Stats are run to identify effects in a
population
• Can be general (all people) or specific (all
drivers)

But we rarely have access to a whole


population

Collect data from a subset of population


(sample)

We use this data to infer effects in population


Sampling from a population
Sampling Distribution
• We can plot these sample means
as a histogram telling us the
frequency of certain sample
means

• This is called the sampling


distribution.
Standard Error
Think back to Standard Dev. (s)
We use it to tell us the average deviation from the mean

The Standard Error (or SE) is effectively the s of sample mean from the
population mean

Clever statisticians have identified ways of estimating this using the


sample s
Confidence Intervals
Standard error- a sense of how
much sample means differ

Different approach- estimate


the bounds of where the true
population mean falls

Usually 99% and 95% intervals-


meaning: 99% or 95% of the
time the true mean (or value
of any stat) will fall in
between these intervals
What we are trying to do with stats

We aim to explore/answer a We collect data (or use existing We 1) build a statistical model and 2)
hypothesis datasets) that represent what we see how it fits with the actual data
want to observe we collected
Is our statistical model any
good?

We then assess model fit by comparing the values the model


estimates and the actual data

The best model is that which minimises deviance


Variance- Equation (for reference only)

s 2
=
∑ (x − x)
i
2

N −1
Standard Deviation (s)- for reference only

You might also like