You are on page 1of 5

DSA212 Data Science with R School of Economics

Assignment 1 Singapore Management University

Data Wrangling & Visualization


T2/AY2023-24

Instructions (please read carefully):

• Hand in a soft copy of your code through eLearn using an R Markdown (.Rmd) file.
• Write down your full name, student ID number.
• Show all work, clearly, and in order, if you want to get full credit.
• This assignment has 10 exercises and is worth 50 points.

Strictly confidential – Do not distribute without prior permission from the author(s).
DSA212 Data Science with R School of Economics
Assignment 1 Singapore Management University

Strictly confidential – Do not distribute without prior permission from the author(s).
DSA212 Data Science with R School of Economics
Assignment 1 Singapore Management University

Exercises
The file OMI.csv contains daily realized measures for 31 financial assets. 1 The series have
different start dates, but all end in Dec 2021.

1. (3 points)
Place the Assignment_1.Rmd and OMI.csv files together in a folder. Open Assignment_1.Rmd in
RStudio and replace the code chunk beneath # Exercise 1 with your own code to:
(1) Load the readr and dplyr packages.
(2) Read the data in the CSV file to a data frame DF.
(3) Get a glimpse of DF. How many rows does this data frame have? How many columns?
Give your answer as text beneath the output of your code.

2. (5 points)
In Assignment_1.Rmd, replace the code chunk beneath # Exercise 2 with your own code to:
(1) Choose columns when (date-time), Symbol (ticker symbol), rv5 (realized variance, 5-min),
and rk_parzen (realized kernel variance, non-flat Parzen) of DF.
(2) Change the names of the when, Symbol, rv5, and rk_parzen variables to t, ticker_symbol, RV
and RK, respectively.
(3) Overwrite DF with the resulting data frame.
(4) Get a glimpse of DF. How many rows does this data frame have? How many columns?
Give your answer as text beneath the output of your code.
The code for items (1) – (3) should use key functions of the dplyr package and be written as a
single data manipulation pipeline.

3. (7 points)
In Assignment_1.Rmd, replace the code chunk beneath # Exercise 3 with your own code to:
(1) Load the lubridate package.
(2) Convert the entries of column t of DF from date-time format <dttm> to date format <date>
(use the lubridate cheat sheet to find a suitable function to extract the date component).
(3) Choose the rows that correspond to dates after the year 2015 and the Straits Times
Index (ticker symbol .STI).
(4) Assign the result to a new data frame STI.
(5) Get a glimpse of STI. How many rows does this data frame have? How many columns?
Give your answer as text beneath the output of your code.
(6) Apply the View() function to STI and scroll through the data. What is the start date? What
is the end date? Give your answer as text beneath the output of your code.
The code for items (2) – (4) should use key functions of the dplyr package and be written as a
single data manipulation pipeline.

1 This dataset was obtained from the Oxford-Man Institute’s Realized Library v0.3.

Strictly confidential – Do not distribute without prior permission from the author(s).
DSA212 Data Science with R School of Economics
Assignment 1 Singapore Management University

Strictly confidential – Do not distribute without prior permission from the author(s).
DSA212 Data Science with R School of Economics
Assignment 1 Singapore Management University

Strictly confidential – Do not distribute without prior permission from the author(s).

You might also like