You are on page 1of 3

R - Programming

Assignment

Instructions:

• You should submit two files:


(i) a single .Rmd script file containing all of the commented code you used to obtain
your answers
(ii) the HTML (or pdf) file which you produced from the .Rmd script

• The marks available for each question are shown in brackets

• You may need to find some new functions in order to do some of these tasks.
Remember to use R’s search engine, as well as checking online.

• Make sure that your file is readable and has a neat presentation and clear flow. The
HTML (or pdf) output file should be a stand-alone document containing the answers
to all questions and showing all necessary code.

• I advise you to first create an R script with all of your answers. When you are happy
with this, convert it piece-by-piece into an .Rmd file.

1
Question 1: [25 marks]
The dataset quine is available in the R package called MASS.

(i) Load the package MASS, and load the dataset quine. Type the command needed to
see the top 6 rows of the dataset. [5 marks]

(ii) What commands are used to access the description and the structure of the dataset?
Briefly describe the data contained in quine. [5 marks]

(iii) Print the 7th row of quine. What is the value of the Days variable in this row? [5
marks]

(iv) Two of the factors should be ordered factors. Use the help file to help you decide which
two. Convert them to ordered factors, and check the structure again to confirm that
this has worked. [5 marks]

(v) Create a table of Sex against Age. What is the combination of Sex/Age with the
largest number of students? [5 marks]

Question 2: [50 marks]


The dataset diamonds is available in the R package called ggplot2.

(i) Install and load the package ggplot2, and load the dataset diamonds. Look at the
structure and the top of the dataset. Briefly describe the contents of the diamonds
dataset. [3 marks]

(ii) How many observations are there in diamonds? [2 marks]

(iii) The variables x, y, and z measure the dimensions of the diamonds in different directions.
Make a new variable size which is the product of these three variables, and attach it
to the dataframe. [3 marks]

(iv) Which row contains the diamond with the largest size? What colour is this diamond?
[3 marks]

(v) Use the cut() function on the size variable to create a new variable size.cat, by dividing
it up into 3 categories: small (0 - 200), average (200 - 1000) and large (1000 - 4000).
Should it be an ordered factor? Attach this new variable to the dataframe. [5 marks]

(vi) Are there any missing values in size.cat? Run the following line to see:
sum(is.na(diamonds$size.cat))
Why is this? Can you suggest a sensible fix? If so, fix this variable by amending the
column diamonds$size.cat appropriately. [6 marks]

(vii) Create a table of color against clarity. Write some code to find the colour/clarity
combination with the smallest number of diamonds. (I.e., you must find this using
code, and not just by looking at the table.) [6 marks]

2
(viii) Create a subset of diamonds called sub.diamonds which contains only the carat, depth
and price columns. [3 marks]

(ix) Use the aggregate() function to aggregate sub.diamonds to find the mean of each
column for every cut and color combination. (Tip: the result should be a dataframe
of 35 rows.) Save this output as a dataframe called df1. [5 marks]

(x) Order df1 by descending price. Comment on the results. [4 marks]

(xi) Be creative - produce an interesting table or create a new variable which helps to tell
us something new about the diamonds dataset. Describe your findings. [10 marks]

Question 3: [25 marks]


Create a function called fun1 which takes a vector as input and returns a list. The list
returned should have four components:

(i) the first must contain the original input vector, sorted in ascending order

(ii) the second must contain the minimum and maximum values in the vector

(iii) the third must contain the standard deviation and the variance of the vector

(iv) the fourth must contain the lower quartile, the median value and the upper quartile of
the vector

Your function should be clearly commented. Include a warning at the beginning of fun1
that stops the function if the input vector is not numeric or has a length of less than 1. You
may find the stop() or break() functions useful here.
Test your function using the code below:

You might also like