You are on page 1of 2

Working with only normal curves

To start we are just going to plot some basic curves from math. We can do this using the curve
function and putting the equation of the curve we want inside, for example we could do:
curve(x^2) or curve(3*x+4) or curve(sqrt(x))
You'll notice that by default they only get plotted on the interval [0; 1]. We can change that by
adding an argument for xlim, just like we did when making histograms:
curve(x^2, xlim = c(0, 20)) makes the x-axis go from 0 to 20
We can also do things like change the color of the curve or the width of the line like so:
curve(x^2, col = "red", lwd = 3, xlim = c(0, 20))
In addition you can change the title and axis labels in the exact same way we did it for a his-
togram. To start plotting normal curves we can use the dnorm() function, which gives us the
density curve for a standard normal distribution:
curve(dnorm(x), col = "red", lwd = 3, xlim = c(-5,5))
To adjust the mean and standard deviation of the normal curve we can add additional arguments
in the dnorm function. For example
curve(dnorm(x, 5, 20), col = "red", lwd = 3, xlim = c(-100, 100))
Will plot a normal distribution with  = 5 and  = 20 (notice how we also changed the
limits). Finally we can plot several dierent normal curves on the same graph by telling R
to add addition curves we draw like so:
curve(dnorm(x, 5, 20), col = "red", lwd = 3, xlim = c(-100, 100)) draws the first
curve
curve(dnorm(x, 5, 30), col = "blue", lwd = 3, add = TRUE) adds a second blue curve
with a dierent standard deviation
curve(dnorm(x, 0, 30), col = "green", lwd = 3, add = TRUE) adds a third curve with a
dierent mean

Loading our data


Now we're going to be revisiting histograms, and looking at overlaying probability distributions
onto them. The data set we will be using is once again the olympic athlete data, but you could
use another data set if you want to.
oly <- read.csv("FILE PATH") remember to change FILE PATH or check out the histogram.pdf
if you need a refresher on loading data

Creating a histogram from our data


Remember that to create a histogram we just need to do a simple command:
hist(oly$height, breaks = 30)
Remember you can adjust the number of bins, title, colors etc.
One nal thing we are going to do is use a probability histogram. This will change each bar from
representing an absolute number to the percentage of numbers in that bin, which is basically the
same as what a probability density function does. All we have to do is tell R that we're making
no longer making a frequency histogram:
hist(oly$height, breaks = 30, freq = FALSE)

1
Overlaying a normal distribution
In order to put a normal curve on our graph, we rst need to gure out the mean and standard
deviation of the curve. Since our histogram is of oly$height, we can use mean(oly$height) to
get the mean. However you'll notice that is listed as NA. This is because some of the Athletes
don't have their height listed, so we need to remove the NA values before taking the mean. To do
that we can do:
mean(oly$height, na.rm = TRUE)
This will tell R to remove the NA values before taking the mean. We can also do the same thing
to nd the standard deviation:
sd(oly$height, na.rm = TRUE)
Then we just need to plot the curve, which we will do using the curve function like so:
curve(dnorm(x, mean(hw$height, na.rm = TRUE), sd(hw$height, na.rm = TRUE)), add =
TRUE)
You might notice that the curve gets cut o a bit at the top. We can x that by increasing the
ylim when we plot the original histogram, and the putting the curve on top again.
hist(oly$height, breaks = 30, freq = FALSE, ylim = c(0, 4))
curve(dnorm(x, mean(hw$height, na.rm = TRUE), sd(hw$height, na.rm = TRUE)), add =
TRUE)
Now we can plot normal distributions ontop of our histograms. Hopefully this will give us an idea
of how well our data matches up with a normal distribution.

You might also like