You are on page 1of 27

ECON20003 – QUANTITATIVE METHODS 2

TUTORIAL 2

Download the t2e3 Excel data file from the subject website and save it to your computer or
USB flash drive. Read this handout and try to complete the tutorial exercises before your
tutorial class, so that you can ask help from your tutor during the Zoom session if necessary.

After you have completed the tutorial exercises attempt the “Exercises for assessment”. You
must submit your answers to these exercises in the Tutorial 2 Homework Canvas
Assignment Quiz by the next tutorial in order to get the tutorial mark. For each assessment
exercise type your answer in the relevant box available in the Quiz or type your answer
separately in Microsoft Word and upload it in PDF format as an attachment. In either case,
if the exercise requires you to use R, save the relevant R/RStudio script and printout in a
Word document and upload it together with your written answer in PDF format.

Mathematical operators and functions

R has a number of operators, like the basic arithmetic and logical operators. The arithmetic
operators are

+ add
- subtract
* multiply
/ divide
^ exponentiation (x^y raises x to the power of y)

R follows the usual rules in evaluating expressions, i.e. ‘from left to right’ and ‘from highest
operator precedence to lowest’ (exponentiation, followed by multiplication and division,
followed by addition and subtraction). To enforce a different order of evaluation, use
parentheses.

The following logical operators return TRUE or FALSE depending on whether the expression
is true or false:

= equal to (to assign data to an object)1


== equal to (to test whether an object is exactly equal to something)
!= not equal to
> greater than
< smaller than
>= greater than or equal to
<= smaller than or equal to
& and (A & B is TRUE if both A and B are true, it is FALSE otherwise)
| or (A | B is TRUE if either A or B is true, it is FALSE otherwise).

1
You can use = almost interchangeably with the <- leftward assignment operator.
1
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
In addition to these basic arithmetic and logical operators, R also has the usual array of
mathematical functions, like:

sqrt(x) square root of x


abs(x) absolute value of x
log(x) natural logarithm of x
exp(x) exponential (ex)
round(x,n) rounds x to n decimal places (i.e. to the nearest integer if n = 0)
signif(x,n) rounds x to n significant figures
factorial(x) factorial of x (i.e. x is a non-negative integer, x! = x(x-1)…1 and 0! = 1)
ceiling(x) smallest integer which is greater than or equal to x
floor(x) largest integer which is less than or equal to x

The next two exercises serve to illustrate the application of some of these operators and
functions.

Exercise 1

With numbers, RStudio can be used like any ordinary calculator. To demonstrate this, launch
RStudio, type

4*16

in the Console and hit Enter.2 In return, RStudio performs the required calculation, displays
the result and prompts you to type in the next command.3 Type now

(4*16/sqrt(256))^3

and hit Enter. Finally, type

log((4*16/sqrt(256))^3)

and hit Enter.

As you can see on your screen or on the screenshot on the next page, the code always
follows the > prompt and the output always follows a number in square brackets.

Quit RStudio without saving anything.

2
This time, for the sake of illustration, we enter the code in the Console tab rather than in the Source panel.
Recall, however, that in general it is better to rely on the Source panel.
3
Each command must start in a separate line of code.
2
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
Exercise 2

Retrieve the t1e1 data set that you created and saved in Exercise 1 of Tutorial 1 and
generate the following three new variables4:

X = Height + Weight
Y = Age2
Z = 3Age - 5

Launch RStudio. Create a new RStudio project and script, and name both t2e2. RStudio
adds and opens a third folder in your working directory, t2e2, which contains two files now,
t2e2.Rproj and t2e2.R.

On your Files tab click on the “..” symbol next to the green arrow or on the main menu follow
the File / Open Project… steps and navigate to your working directory. By now you should
have four folders in your working directory, t1e1, t1e2, t2e1and t2e2 (see the screenshot on
the next page).

4
Do not worry that the new variables do not make any sense, this exercise is just for illustration.
3
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
Open the t1e1 folder and click t1e1.RData. RStudio asks you whether to load the data file
into your global environment. Click Yes.

In return, RStudio displays the data set on the Environment tab:

Save it in the t2e2 folder as t2e2.RData. You should now have three files in this folder (see
the first screenshot on the next page).

4
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
To create the first new variable, X, type the following command in the Source panel:

X=Height+Weight

Click on the Run button. In return, RStudio echoes the command in the Console, creates
the new variable and displays it on the Environment tab:

Spacing does not matter in R, so just to make your code easier to read and follow, the
previous command could have been entered in the following format as well:

X = Height + Weight

Both are correct and the choice between them is subjective and down to personal taste.5

This command sets variable X on the left side equal to the sum of Height and Weight on the
right side. This is a so-called leftward assignment and instead of the equality symbol, =, one
could also use the leftward assign operator <-, i.e.6

X <- Height + Weight

5
I prefer the second style and I’ll use it in these tutorial handouts.
6
I prefer =. There is also a rightward assign operator, ->, but it is used less frequently.
5
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
To generate Y and Z, type the following two commands in the Source panel:

Y = Age^2
Z = 3*Age - 5

Highlight all three commands in your script and click on the Run button. Compare your
screen to the screenshot below.

ZWhat if you make a mistake in typing a command? Try, for example, running the following
command:

X = height + weight

It is incorrect since, as you learnt last week, R is case sensitive and we have variables Height
and Weight in the active Global Environment, but not height and weight. Hence, R returns
an error message in the Console:

Error: object 'height' not found

Similarly, if you try to execute

Z = 3Age - 5

6
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
R returns an error message in the Console

Error: unexpected symbol in "Z = 3Age"

In addition, it inserts a red  symbol in front of the 5th script line in the Source panel to show
the error in the code.

Quit RStudio by following the File / Quit Session… menu steps. RStudio warns you that your
RData and R files have unsaved changes. Click on the Save Selected tab.

Now open Windows File Explorer and check your t2e2 folder. It should contain five items,
including two RData files.7 One of them is named t2e2, this is the file that you saved after
having restored the data set from t1e1, and the other one is an updated but unnamed version
of it that was saved by RStudio at the end of your session.

To see the difference between them, right-click the first one, choose Open with > RStudio
and have a look at the Environment tab:

Then, quit RStudio without saving anything and right-click the second, unnamed, RData (or
R Workspace) file and choose Open with > RStudio. Your Environment tab should now look
like this:

7
You might have two R Workspace files instead. On my computer, by default, these files are opened by
RStudio, that’s why their extension is RData.
7
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
You clearly do not need the first RData file anymore, so after having quit RStudio delete it
and name the second RData file t2e2.

Importing Data

You learnt that a data set can be entered in RStudio either by typing it straight from the
keyboard in an RStudio spreadsheet or by importing the data previously saved in the native
R format or in a foreign file format. Last week in Exercise 1 you typed in a small data set,
this time you are going to import the data from an Excel spreadsheet.

Exercise 3

The t2e3.xlsx Excel file contains observations on the height (cm) and weight (kg) of 20
randomly selected people. Before importing a dataset to R, it is usually a good idea to check
its structure, so open the t2e3.xlsx file in Excel first. As you can see (the screenshot is on
the next page), the data are arranged in two columns in the spreadsheet. The variable
names, Height and Weight, are in the first row and the first observations are in the second
row. Close Excel.

Launch RStudio, create a new project and script, and name both t2e3. To import the data
from Excel, click on the Import Dataset button on the Environment tab and from the opening
drop-down menu select the From Excel… option (see the second screenshot on the next
page).8

8
Alternatively, follow File / Import Dataset on the main RStudio toolbar.
8
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
9
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
The easiest way to import data from Excel to RStudio is provided by the readxl package. If
it has not been installed on your computer yet at all or if you do not have its updated version,
RStudio asks you whether you want to install it now.

Answer Yes and wait for the package to be installed.

Once the readxl package is available, RStudio opens the Import Excel Data dialogue window
and prompts you to enter the data source. Click Browse…, locate the t2e3.xlsx file on your
hard drive or USB key and click Open. RStudio displays a preview of the data.

10
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
If you are satisfied with what you see click on the Import button. Your screen should now
look like this:

The last command in the Console, View(t2e3), was initiated by readxl and it invoked the
data viewer in the Source panel.

Save this data as t2e3.RData.

Click on the right arrow in front of t2e3 on the Environment tab to verify the imported data.
You should now see the contents of t2e3, namely the variable names (Height, Weight), the
types of the variables (num for numeric) and the first 10 observations.

11
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
The RStudio data viewer can be used to look inside data frames and to sort, filter and search
the data.

To sort a data set in the ascending order of a variable, click on its name in the column
heading. For example, if you click on Height, the observations in both columns get sorted in
the ascending order of Height.

If you click Height the second time, the data get resorted in the reverse direction (see the
screenshot on the next page).

The leftmost column in the viewer displays the observation numbers, i.e. the original
positions of the pairs of observations in the t2e3 Excel spreadsheet. To remove sorting and
show the data in its original form, click the empty cell in the heading of this column.

12
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
Occasionally, you might also wish to filter the data so that only the relevant information
appears. For example, you might wish to consider only those people in the sample who
weigh at least 70kg. To do so, click the Filter icon in the toolbar and then on All in the white
box right beneath the variable name Weight. A brushing histogram appears with bins
generated by the hist function of R.

13
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
Click on the user-editable text box beneath the histogram and increase the lower limit to 70.
In return, RStudio filters out two pairs of observations.

Before moving on, make sure that you undo filtering by setting the lower limit back to 60 and
then close the data viewer.

After having read some data into RStudio, you may want to use a variable in some statistical
analysis. Note that it is not possible to call a variable simply by its name. If you try to do so,
RStudio displays an error message.

For example, type Height in the Source panel and click on the Run button. Although Height
is on your Data tab in the Environment panel, you receive the following message:

Error: object 'Height' not found

14
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
You can call a variable from an already loaded data set in two different ways. The first option
is to call a variable by the name of the dataset that includes it and the name of the variable
joined by a $ sign, while the second option is to attach the whole data set to the active project
with the attach() function and then refer to the variables simply by their names.

In this case, for example, you need to type

t2e3$Height

and click Run.9 In return, RStudio displays the values of Height in the Console.

Alternatively, type the following two commands in the Source panel

attach(t2e3)
Weight

Highlight both commands and click the Run button. In return, RStudio echoes the commands
and displays the values of Weight in the Console.10

Graphical Descriptive Techniques

As you learnt in QM1, statistics has two basic areas, descriptive statistics and inferential
statistics. Descriptive statistics deals with the organisation, summarisation and presentation
of some data in a convenient, relatively simple but still informative form, while inferential
statistics is used to draw conclusions about one or several populations based on samples
drawn from them.

The various tools of descriptive statistics fall into two categories, such as (i) tabular and
graphical techniques and (ii) numerical descriptive measures. Although they are fairly
straightforward, their importance cannot be overrated. In fact, every statistical project should
start with some form of exploratory data analysis. In addition, descriptive statistics can also
help communicating the results of projects effectively.

In this tutorial, we focus on how to plot data.11 Like many tasks, this can be done in several
different ways in R/RStudio. We shall rely on the basic R graphics functions that can be
used to create standard statistical plots, including scatterplots, boxplots, histograms, bar
plots, and line charts.

9
Notice that as soon as you type t2e3$, RStudio displays the names of the variables in the data set, just in
case you would not remember them. Choose Height from the list.
10
You are going to keep working with the t2e3 data set. Otherwise, it is possible to undo attach() with the
detach() function.
11
It is assumed that you are familiar with the various types of plots (like bar chart, pie chart, histogram,
frequency polygon, ogive, line chart, scatter plot etc.). If you are uncertain, review Chapters 3 and 4 in the
Selvanathan book.
15
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
In R, graphs are typically created interactively by calling various graphics functions. These
functions can either create a complete plot or add an extra layer to some existing plot. Basic
plots can be produced by calling one of the following functions:

plot(x,y) Scatterplot of two quantitative variables, x (horizontal axis) and y


(vertical axis);
hist(x) Histogram for a quantitative variable, x;
barplot(x) Bar plot for a qualitative variable, x;
lines(x,y) Line chart of two quantitative variables, x (horizontal axis) and y
(vertical axis);
boxplot(x) Boxplot for a quantitative variable, x.

These functions can have additional arguments, like e.g.

data: data frame;


type: type of the plot, e.g. "p" for points, "l" for lines, "b" for both points and lines, "c" for
empty points joined by lines;
xlim, ylim: two-element numeric vectors defined by the c() function12 to set the minimum
and maximum values for the horizontal and vertical axes, respectively;
xlab, ylab: labels for the horizontal and vertical axes, respectively;
main: plot title (on top) and subtitle (at bottom), respectively.
legend: placement of the legend, e.g. “left”, “topleft”;
pch: plotting symbol;
col: colour code or name for lines and symbols.13

To find the precise descriptions and details of these functions, for example, plot, enter and
run the following command in the Console:

?plot

It opens the relevant help page in the Help tab of the bottom-right panel of RStudio.

Let’s now return to our exercise to illustrate these graphics functions.

Exercise 3 (cont.)

In the Source panel type

plot(Height, Weight)

and press Run. On the Plots tab, you should have the following scatterplot on your screen:

12
The c() function stands for “concatenate (combine/collect) numbers into a vector”.
13
There are 25 commonly used plotting symbols and 657 built in colour names in R. They are shown in the
“Shapes and Colours in R” pdf file on the subject website.
16
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
It visualizes the relationship between Height and Weight. Swap the two variables and
execute

plot(Weight, Height)

You should now have the scatterplot below. As you can see, in the first scatterplot Height is
on the horizontal axis and Weight is on the vertical axis, while in the second scatterplot
Weight is on the horizontal axis and Height is on the vertical axis.

17
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
These examples illustrate that:

If you provide two variables names only, then by default the plot function returns a
scatterplot. It takes the first variable as the x variable and assigns it to the horizontal axis
and takes the second variable as the y variable and is assigns it to the vertical axis.14

What if you specify one variable only for plot, instead of two? Execute, for example, the
following commands, one by one:

plot(Height) and plot(Weight)

You should get the plots shown on page 19. They illustrate that:

If you specify one variable name only, the plot function treats this variable as the y variable
and displays the observation numbers on the horizontal axis.

14
It is also possible, though not necessary, to specify explicitly which variables to assign to the horizontal (x)
and vertical (y) axes of the scatterplot with the plot(x = “name 1”, y = “name 2”) command.
18
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
To demonstrate another graphics function, run the

hist(Height) and hist(Weight)

commands one by one. They produce the histograms displayed on page 20.

By now you have created six simple plots and on the Plots tab RStudio displays the latest
one.

You can switch between plots either by clicking the left/right blue arrows on the Plots tab
(right beneath the Files tab), or by following the Plots / Next Plot and Plots / Previous Plot
menu options.

Scroll over your plots. As you can see, they are nice but simple, based on the preset values
of the arguments of the plot function. They can be made to look a bit fancier by the additional
arguments mentioned earlier on page 16.

For example, to add a title to the first scatterplot and to represent the pairs of observations
on this scatterplot with red dots (shape #19 filled by “red”), execute the following command15:

plot(Height, Weight,
main = "Scatterplot of Weight versus Height",
col = "red", pch = 19)

The new scatter plot is shown on the top of page 22.

We can provide more informative labels of the variables using the xlab and ylab arguments
of plot. Execute, for example, the following command to obtain the second plot on page 22:

plot(Height, Weight,
main = "Scatterplot of Weight versus Height",
col = "red", pch = 19,
xlab = "Height in cm", ylab = " Weight in kg")

15
This command is quite long so it could not be seen in the Console without scrolling sideways. For this reason,
I have chopped it into three shorter segments that fit into the panel and are easy to follow.
19
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
20
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
21
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
Scatterplot of Weight versus Height
90
85
Weight in kg

80
75
70
65

165 170 175 180 185 190

Height in cm

22
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
If you consider the scatterplots you have developed so far, you can see that the scales on
their horizontal and vertical axes do not start at the origin. If you wish RStudio to do so, you
need to specify the minimum and maximum values for the horizontal and vertical axes with
the xlim and ylim arguments of plot. For instance, to set the scale on the horizontal axis from
0 to 200 and the scale on the vertical axis from 0 to 100, execute the following command:

plot(Height, Weight,
main = "Scatterplot of Weight versus Height",
col = "red", pch = 19,
xlab = "Height in cm", ylab = " Weight in kg",
xlim = c(0,200), ylim = c(0,100))

The new scatterplot is below.

Scatterplot of Weight versus Height


100
80
Weight in kg

60
40
20
0

0 50 100 150 200

Height in cm

Move now back to the histograms on page 21. By default, they are black and white, and the
number of class intervals, called bins, are determined by R using an algorithm. These
features can be overwritten by the col and breaks arguments.

Execute, for example,

hist(Height, col = "green")


hist(Weight, breaks = 20, col = "blue")

to generate the histograms shown on the next page.

23
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
24
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
The plot displayed on the Plots tab can be saved as an image or as a pdf document, or it
can be copied to the clipboard by clicking on the Export button of the Plots tab, or on the
Plots button of the main menu:

Copy your plots one by one to the clipboard using the default options and insert them in a
Word document.

Quit RStudio and save your RData and R files.

25
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
Exercises for Assessment

Exercise 4

In this exercise you are going to work on the data you saved in Exercise 2 last week.16

a) Launch RStudio and close the Script tab, if it is open. Create a new RStudio project and
script, and name both t2e4. Retrieve the t1e2 data set and save it as t2e4.RData.17

The variable of interest, Days, is a discrete quantitative variable. The data set is cross-
sectional, and it can be displayed graphically with, for example, a histogram or a boxplot.

b) Use RStudio to illustrate the data on Days with a histogram. Customize your plot as you
did in Exercise 3. Briefly describe what the graph tells you.

c) Use RStudio to illustrate the data on Days with a boxplot and customize your plot. Briefly
describe what the graph tells you.

You can develop a basic boxplot by executing the boxplot(Days) command and then add a
main title to it, add the Days label to the vertical axis, and colour the rectangle on the boxplot
red.

Exercise 5

The table below details the number of international visitors (aged 15 years and over) to
Australia from its top 10 markets in 2018/19 (year ending in September) by country of
residence (COR).18

Overseas arrivals (‘000) by 
country of residence (COR) 
COR  Visitors 
China  1331 
Hong Kong  284 
India  364 
Japan  455 
Korea  250 
Malaysia  344 
New Zealand  1276 
Singapore  417 
UK  670 
US  771 

16
If you have not completed Exercise 2 of Tutorial 1, then do it before attempting Exercise 4 of Tutorial 2.
17
You should do what you did at the beginning of Exercise 2 of Tutorial 2.
18
Source: Estimates for the year ending June 2019 from the International Visitor Survey, Data, Table 1a,
https://www.tra.gov.au/International/International-tourism-results/overview.
26
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2
a) There are two variables: Market and Visitors. Are they qualitative or quantitative,
discrete or continuous? Explain your answers.

b) Launch RStudio, create a new RStudio project and script (t2e5), enter the observations
from your keyboard to an RStudio spreadsheet and save it as an RData file.

c) Depict the number of visitors by country of residence with a bar graph.19

Use the barplot(Visitors) command to develop a basic bar graph.

d) Annotate your bar graph with axes labels Country of Residence (x-axis), Visitors to
Australia (y-axis) and with the Bar graph for Visitors to Australia title.

Review the application of the main, ylab and xlab arguments.

e) Increase the scale on the vertical axis to (0,1400) and colour the bars orange.

Review the application of the ylim and col arguments in Exercise 3.

f) To make the bar graph more informative, expand the barplot command with the
names.arg = COR and cex.names = 0.5 arguments.

g) Briefly describe what the bar graph in part (f) tells you.

19
Notice that this time a histogram would be inappropriate because the observations are classified by
categories (countries of residence) rather than adjacent class intervals.
27
L. Kónya, 2020, Semester 2 ECON20003 - Tutorial 2

You might also like