You are on page 1of 4

CS 5301 – Programming Foundations for Data Analytics

Assignment 2
Due: 10/03/2021, 23:59
Submission: You should write all your solutions into a single .r file (R Script). Submit your .r file
through Blackboard. Include the question number as a comment. Your .r file should be in the
following format:
#Q1.a
...

#Q1.b
...

#Q1.c
...

#Q2.a
...

Q1 (50 pts).
a. (5 pts) Read the monthly temperature data from the file “dallasTemp.txt” and store in a data
frame named d. Also, set the column names as Year, Month, and Temperature.
b. (5 pts) Print the years in which the temperature of January is greater than 50. An example
output can be
Year
25 2012
85 2017
121 2020

c. (5 pts) Add fourth column to the data frame with name “TempCelsius” and add Celsius
equivalent of each temperature. Print the data frame.
d. (5 pts) Find the indices of the months having maximum and minimum temperature using
which() function and print the months using the index values returned from which() function.
The outputs should be
Month Year TempCelsius
20 August 2011 34.11111

Month Year TempCelsius


2 February 2010 5.388889

e. (5 pts) Print the average temperature (Fahrenheit) in 2016. The output should be 69.16667.
f. (5 pts) Save the data frame to an excel file named “dallas.xlsx”. Do not include row names.
(Do not submit the xlsx file. xlsx file should be created when I run your R script.)
g. (5 pts) Update the data frame by setting the temperature in March 2010 as 57 and the
temperature in June 2010 as 83. You should also update TempCelsius of March 2010 and June
2010. Print the head of the data frame after your modifications. The output should be
Year Month Temperature TempCelsius
1 2010 January 44.3 6.833333
2 2010 February 41.7 5.388889
3 2010 March 57.0 13.888889
4 2010 April 66.7 19.277778
5 2010 May 76.9 24.944444
6 2010 June 83.0 28.333333

h. (5 pts) Get the 2013 data and set row names as the numbers from 1 to 12. Add a new row
having month name “Average” and set temperature values as the mean of temperatures. Print
the data frame. The output should be
Year Month Temperature TempCelsius
1 2013 January 49.10000 9.500000
2 2013 February 52.00000 11.111111
3 2013 March 56.40000 13.555556
4 2013 April 63.00000 17.222222
5 2013 May 72.30000 22.388889
6 2013 June 82.60000 28.111111
7 2013 July 84.50000 29.166667
8 2013 August 87.10000 30.611111
9 2013 September 82.40000 28.000000
10 2013 October 68.20000 20.111111
11 2013 November 53.50000 11.944444
12 2013 December 43.10000 6.166667
13 2013 Average 67.61288 19.784933

i. (5 pts) Delete the “TempCelsius” column of 2013 data (the data frame in the previous
question). Print the data frame.

j. (5 pts) Delete the “Average” (last) row. Print the data frame after sorting it by Temperature.
Hint: you can use order() function.
Year Month Temperature
8 2013 August 87.1
7 2013 July 84.5
6 2013 June 82.6
9 2013 September 82.4
5 2013 May 72.3
10 2013 October 68.2
4 2013 April 63.0
3 2013 March 56.4
11 2013 November 53.5
2 2013 February 52.0
1 2013 January 49.1
12 2013 December 43.1

Q2 (50 pts).
a. (10 pts) Create the following plot using qplot function.

x needs to be selected as a sequence from 0 to 4π with an increment by 0.5. y needs to be


selected as sine of x. sin function can be used.
b. (20 pts) Create the following plot using ggplot function.
Get the first nine rows of swiss dataset (in datasets package) as your data. Set the label of your
plot as the row names of your data. Set the limits of x axis as (74,96) and y axis as (2,17). Use
geom_text() and geom_point().

c. (20 pts) Read “texasGas.xlsx” file and store the monthly average gas prices of Texas in 2019
and 2020. Convert month, year, and rating to factors. Then, create the following two plots:

In the first plot, you should set the point colors manually. In the second plot, you should only
display data for 2020. In order to display the month names vertically, you can add the following
layer:
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size = 7))

You might also like