You are on page 1of 1

Manipulating data in R

We will be continuing with our data set on olympic athletes. If you remember this data set con-
tains lots of information about olympic athletes from the 2016 olympics. (If you don't remember
how to load the data you can take a look at the histogram.pdf le on canvas).
To start with we will do some simple manipulations. For example we can change the units from
meters to inches like so:
inch <- oly$height * 100 / 2.54 (the 2.54 is the conversion factor between cm and in)
And then make a histogram of that data using hist(inch). You can of course add all the other
modiers from last class as well to modify the histogram (e.g. xlab = "Height in inches")
We can also create a new column of data and add it to our table. For example we could count
total medals won like so:
oly$total_medals = oly$gold + oly$silver + oly$bronze
Then we could get a ve number summary of that with summary(oly$total_medals) (or do
whatever else we want to).
Getting subsets of data
Another really powerful thing we can do is to look at only a part of the data (e.g. maybe we
only want to look at heights of judo athletes) to do that we will be learning about the subset
command. To start with lets get the data on only the judo athletes in our data set:
judo <- subset(oly, sport == "judo")
In the subset function, the rst argument is what table we want to take data from and the second
is what condition we want the data to meet. In this example we are requiring that the sport
column is equal to judo (in computer science we usually use == to mean equals since = is used to
assign a value). Now that we've subsetted our data we can once again make a histogram by doing
hist(judo$height) once again adding whatever modiers you want.
There are many other ways we can use the subset function. A few exmaples are given below:
newdata <- subset(oly, weight > 76) gives all athletes whose weight is more than 76kg
newdata <- subset(oly, height <= 1.70) gives people whose height is less than or equal to
1.70m
newdata <- subset(oly, height > 160 & sex == "female") gives females with height more
than 160cm. Note that & is the symbol used for 'and' (meaning both conditions must be true).
newdata <- subset(oly, gold >= 1) people who won at least one gold medal
newdata <- subset(oly, height > 1.70 | weight > 70) people whose height is more than
1.70m or whose wiehgt is more than 70kg. Note the | is the symbol use for or (at least one of
the conditions must be true)
Try to write conditions for the following:
Gold medal winners who weigh more than 70kg
Athletes who won any medal
American women
Women who won an olympic medal in basketball
Boxers under 1.5m who did not win a medal

You might also like