Theory 1. R Basics

Econometrics
Econometrics - R Basics
Catalin Starica
26th September 2017
Econometrics
First R session
Installation
I R is an open source programming language and software

environment for statistical computing
I Available at CRAN (Comprehensive R Archive Network)
http://cran.r-project.org/
I Additional and user-friendly interface: RStudio
Econometrics
First R session
Documentation
References (1/2)
I http://cran.r-project.org/manuals.html
I See An Introduction to R
http://cran.r-project.org/doc/manuals/R-intro.html
I See the FAQ(s)
http://cran.r-project.org/doc/FAQ/R-FAQ.html
I R: Getting started
http://wiki.math.yorku.ca/index.php/R:_Getting_started
I Introduction to R (Vincent Zoonekynd),
http://zoonek2.free.fr/UNIX/48_R/all.html
I How to make simple graphs
http://www.harding.edu/fmccown/r/
Econometrics
First R session
Documentation
References (2/2)
I In French:
I R pour les débutants (Emmanuel Paradis)
http://cran.r-project.org/doc/contrib/Paradis-
rdebuts_fr.pdf
I Introduction au système R, (Yves Brostaux)
http://cran.r-project.org/doc/contrib/Brostaux-
Introduction-au-R.zip
I In English:
I simpleR - Using R for Introductory Statistics (John Verzani)
http://cran.r-project.org/doc/contrib/Verzani-
SimpleR.pdf
I Book : The R software. Fundamentals of Programming and
Statistical Analysis (Lafaye de Micheaux et al., Springer, 2011)
Econometrics
First R session
First R session
I How to open R
Click on the R icon and everything is ready to work
I The R console
Here you can execute a command instantly, i.e. you type the
formula, you press enter and you receive the output
I Example: (everything after the symbol # is not executed, it is
just a way to add comments):
> 2*(-4.5) # Note that the decimal separator is a point (.)
[1] -9
> 3^2 # 3 to the power of 2
[1] 9
> sqrt(4) # square root of 4
[1] 2
> log(1) # natural logarithm (base e) of 1
[1] 0
Econometrics
First R session
First R session
I How to close R
Type the command q() ou File-> Quit Session... Choose to
do not save a workspace image
I How to set a Working Directory (WD) :
Follow File->Set working directory and set a working directory.
Check: Type getwd() to show the actual working directory
> getwd()
[1] "M:/TravailR"
Econometrics
First R session
First R session
I R script:
You can create a R script to define a sequence of commands
I Create a new script: File -> New script.
I Save a script: File -> Save as... select a directory, and name
the script (e.g. script1.R)
I Save an existing script: File -> Save
I Open an existing script: File -> Open file and choose
I Execute a line of command, right click and Execute, or you can
even push Ctrl + Enter (cmd + Enter on a Mac)
Econometrics
First R session
Define, list and show objects
Exercise 1
I Launch R and define a working directory
I Create a new script with the name script1.R
I Type all the commands shown in the next slide
I Run the obtained script line by line
I Save the script, and then close R
I Open the script: all the codes can be reused
Econometrics
First R session
Define, list and show objects
n = 15 # Define the object named n

n # Show it
n = 5 # Replace n with a new object
n
x = 1
X = 10 # There is a distinction between
# capital and lower-case letters
x
X
r = log(X) + n
r
Econometrics
First R session
Functions and help
I The basic version of R contains a lot of functions

I There is the possibility to extend the number of functions by
installing additional packages, or you can even write a
function for your own
I General syntax:
function_name(parameter1=value1, par2=val2,...)
How to get help

I If you need help to understand a function:
help(function_name)
> help(mean)
I To obtain a list of functions associated to a keyword:
help.search(key_word)
> help.search("median")
Econometrics
First R session
Logical operators
Logical operators
a==b TRUE, if a is equal to b, FALSE otherwise
!(a==b) TRUE, if a is different from b, FALSE otherwise
a!=b TRUE, if a is different from b, FALSE otherwise
a<b TRUE, if a is lower than b, FALSE otherwise
a>b TRUE, if a is greater than b, FALSE otherwise
a<=b TRUE, if a is lower or equal to b, FALSE otherwise
a>=b TRUE, if a is greater or equal to b, FALSE otherwise
a|b TRUE, if a, b, or both are TRUE, FALSE otherwise
a&b TRUE, if a and b are TRUE, FALSE otherwise
Econometrics
First R session
Logical operators
Example
> x=3
> y=4
> z = c(-1,2)
> x>3
[1] FALSE
> x>3 & y==4
[1] FALSE
> z>0
[1] FALSE TRUE
Econometrics
First R session
Installer une extension
I packages: additional libraries, functions

I How to install a package (to do only once):
Example, how to install the package tseries:
Packages-> Install the package(s)-> Switzerland -> tseries
With RStudio is even easier.
I How to load a package (to do every time):
library(tseries)
I How to visualise the functions inside a package:
help(package=tseries)
Econometrics
First R session
Most important objects
I Vectors (vector ):
Formed by a set of elements (numbers, characters)
3 6 5
I Datasets (data frame):
Formed by one or more than one vectors of the same length
The rule: columns ↔ variables / rows ↔ observations
sex weight height
m 60 170
f 57 169
f 51 172
f 55 174
Econometrics
First R session
Most important objects
I Time series (time series):

Series of observations indexed by time
Price
2007-01-03 89.79
2007-02-01 84.42
2007-03-01 85.62
2007-04-02 92.84
2007-05-01 97.21
2007-06-01 95.98
2007-07-02 100.90
Econometrics
Data
How to generate data
I c(...) : define a vector by combining elements

I FROM:TO : define a sequence
I rep(x, n) : repeat n times x.
> c(1,7,0,2)
[1] 1 7 0 2
> 2:7
[1] 2 3 4 5 6 7
> rep(8, times=3)
[1] 8 8 8
Econometrics
Data
How to read data
How to import a dataset

I read.table(file,header=TRUE) : to import a .txt file; the
result is a dataset
I file : the name of the file (the file HAS TO BE in the same
working directory)
I header=TRUE: the first line of the file contains the names of
the variables (otherwise use header=FALSE)
I read.csv(file,header=TRUE) : to import a .csv file; the result
is a dataset
I names(x) : it gives the name of the variables (columns) of the
dataset x
Econometrics
Data
How to read data
Download example1.csv and example2.csv in the working directory
> data1 = read.table("example1.txt", header=TRUE)

> data1 # Show data1
sex weight height
h 60 170
f 57 169
> names(data1)
[1] "sex" "weight" "height"
> data2 = read.csv("example2.csv", header=TRUE)
> data2 # Show data2
sex civil_status number_childs
f c 0
h c 0
> names(data2)
[1] "sex" "civil_status" "number_childs"
Econometrics
Data
How to read data
I Financial time series get.hist.quote(...) : load a time

series of financial data
I instrument : stock name
I start: starting date
I end: end date
I quote: data type
I provider: data provider (e.g. Yahoo)
I compression: reporting frequency, type d for daily, w for
weekly, and m for monthly
(You need to load the package tseries)
Econometrics
Data
How to read data
Install the package tseries

> library(tseries)
> ibm=get.hist.quote(instrument = "IBM",start="2007-01-01",
+ end="2008-09-12", quote = "AdjClose",provider = "yahoo",
+ compression = "m")
> ibm
AdjClose
2007-01-03 89.79
2007-02-01 84.42
2007-03-01 85.62
2007-04-02 92.84
2007-05-01 97.21
2007-06-01 95.98
Econometrics
Data
How to extract information from vectors

I x[n] : the n-th element of the vector x.
> x=c(3,6,5) # define the vector x
> x
[1] 3 6 5
> x[2] # the second element of x
[1] 6
> x[c(2,3)] # the second and third element of x
[1] 6 5
> x>4
[1] FALSE TRUE TRUE
> x[x>4] # elements greater than 4
[1] 6 5
Econometrics
Data
How to extract information from data frames
I t$variable_name : return the variable (column) variable name

I t[i,j] : return the element in the i-th row and j-th column
of the dataset t; the results is a number
I t[i,] : return the whole i-th row of the dataset t; the result
is a vector
I t[,j] : return the whole j-th column of the dataset t; the
result is a vector
Econometrics
Data
Example
> data1
sex weight height
h 60 170
f 57 169
f 51 172
f 55 174
> data1$weight # variable weight of data1
[1] 60 57 51 55 50 50 48 72 52 64 53 72 61
[35] 52 57 53 55 66 65 75 50 53 55 55 72 75
> data1[2,3] # element in the second row, third column
[1] 169
> data1[2,] # second row of data1
sex weight height
f 57 169
> data1[,2] # second column of data1, same as data1$weight
[1] 60 57 51 55 50 50 48 72 52 64 53 72 61
[35] 52 57 53 55 66 65 75 50 53 55 55 72 75
Econometrics
Data
How to extract information from time series
I st[i] : return the i-th element of the time series st

I as.vector(st) : define st as a vector; so that you can use all
the function made for vectors
> ibm
AdjClose
2007-01-03 89.79
2007-02-01 84.42
2007-03-01 85.62
> ibm[3]
2007-03-01
85.62
> ibm_vect = as.vector(ibm)
> ibm_vect
[1] 89.79 84.42 85.62 92.84 97.21 95.98 100.90
[15] 106.14 111.27 119.80 109.72 118.46 113.12 110.55
Econometrics
Data
Information about vectors

I length(x) : gives the number of elements inside the vector
I rev(x) : reverses the order of the elements

I sort(x) : sort the element (ascending order)
I unique(x) : eliminates the double elements
> data1$weight
[1] 60 57 51 55 50 50 48 72 52 64 53 72 61 78...
[35] ...52 57 53 55 66 65 75 50 53 55 55 72 75 73
> length(data1$weight)
[1] 66
> rev(data1$weight)
[1] 74 47 65 85 86 82 73 85 67 60 72 73 47 62...
[35] ...70 77 62 70 72 70 73 53 80 74 50 49 62 74
> sort(data1$weight)
[1] 47 47 48 49 50 50 50 50 51 51 51 52 52 53...
[35] ...66 67 68 70 70 70 71 71 72 72 72 72 72 73
> unique(sort(data1$weight))
[1] 47 48 49 50 51 52 53 55 57 60 61 62 64 65...
Econometrics
Data
Information about datasets

I dim(x) : gives the dimensions.
I nrow(x) : gives the number of rows.
I ncol(x) : gives the number of columns.
I str(x) : describe the type of variables.
I summary(x) : gives basic summary statistics (very few!)
I table(x) : create a contingency table
I prop.table(table(x)) : derives the marginal probabilities from
the contingency table
Econometrics
Data
> data1
sex weight height
1 m 60 170
2 f 57 169
...
> names(data1)
[1] "sex" "weight" "height"
> dim(data1) # 66 rows, 3 columns
[1] 66 3
> nrow(data1) # 66 rows (observations)
[1] 66
> ncol(data1) # 3 columns (variables)
[1] 3
> str(data1)
’data.frame’: 66 obs. of 3 variables:
$ sex : Factor w/ 2 levels "f","m": 2 1 1 1 1 1 1 2 1 2 ...
$ weight : int 60 57 51 55 50 50 48 72 52 64 ...
$ height: int 170 169 172 174 168 161 162 189 160 175 ...
...(it follows in the next slide)
Econometrics
Data
... (from the previous slide)

> summary(data1)
sex weight height
f:25 Min. :47.00 Min. :150.0
m:41 1st Qu.:53.00 1st Qu.:168.0
Median :65.50 Median :174.5
Mean :64.52 Mean :174.1
3rd Qu.:73.00 3rd Qu.:180.0
Max. :86.00 Max. :200.0
> table(data1$sex)
f m
25 41
> table(data1$sex, data1$weight)
47 48 49 50 51 52 53 55 57 60 61 62 64 65 66
f 2 1 1 4 3 2 5 2 2 0 0 1 0 0 1
h 0 0 0 0 0 0 0 2 0 2 1 2 1 2 1
Econometrics
Math functions, object combination
Simple math functions
I sum(x) : gives the sum of the elements

I mean(x) : gives the mean of the elements
I median(x) : gives the median of the elements
I sqrt(x) : gives the square root of the elements
I log(x) : gives the natural logarithm of the elements
> data1$height
[1] 170 169 172 174 168 161 162 189 160
> sum(data1$height)
[1] 11488
> mean(data1$height)
[1] 174.0606
> median(data1$height)
[1] 174.5
> sqrt(data1$height)
[1] 13.03840 13.00000 13.11488 13.19091
> log(data1$height)
[1] 5.135798 5.129899 5.147494 5.159055
Econometrics
I var(x) : gives the variance

I sd(x) : gives the standard deviation
I min(x) : gives the minimum
I max(x) : gives the maximum
I abs(x) : gives the absolute value
> x = c(-1, 2, 5, -7)

> x
[1] -1 2 5 -7
> var(x)
[1] 26.25
> sd(x)
[1] 5.123475
> min(x)
[1] -7
> max(x)
[1] 5
> abs(x)
[1] 1 2 5 7
Econometrics
I cumsum(x) : gives the cumulative sum

I diff(x) : gives the difference between an element and the
previous one
I union(x,y) : gives the union set
I intersect(x,y) : gives the intersection set
> x = c(-1, 2, 5, -7)

> x
[1] -1 2 5 -7
> y = c(5, 1, 3, 6)
> y
[1] 5 1 3 6
> cumsum(y)
[1] 5 6 9 15
> union(x,y)
[1] -1 2 5 -7 1 3 6
> diff(y)
[1] -4 2 3
Econometrics
I sample(x, n) : Draws a random sample of elements of x of

size n; add replace=TRUE for a sampling scheme with
replacement
I scale(x) : to standardise data
> x = c(-1, 2, 5, -7)

> y = c(5, 1, 3, 6)
> sample(y, 3)
[1] 3 1 6
> sample(y, 6, replace=TRUE)
[1] 5 5 6 5 3 6
> x2 = scale(x)
Econometrics
Objects combination
I rbind(x,y) : combines vectors or dataset by rows

I cbind(x,y) : combines vectors or dataset by columns
> data2
f c 0
h c 0
...
f c 1
(It follows in the next slide...)

Econometrics
Objects combination
(...From the previous slide)

> vary = 1:8
> vary
[1] 1 2 3 4 5 6 7 8
> data2_new = cbind(data2,vary) # We add the variable vary
> data2_new
sex civil_status number_childs vary
f c 0 1
h c 0 2
...
f c 1 8
> obsz = c("f", "c", 2)
> data2_new2 = rbind(data2, obsz) # We add the observation obsz
> data2_new2
f c 0
h c 0
...
f c 1
f c 2
Econometrics
Apply a function on the rows or columns of a dataset
I apply(X, margin, fun) : Applies a function on the rows or

columns of a dataset
I X : matrix or dataset
I margin : margin=1 to apply a function on the rows of X and
margin=2 to apply a function on the columns of X.
I fun : the function to be applied
> X = cbind(data1$weight, data1$height)

> X
[,1] [,2]
[1,] 60 170
[2,] 57 169
[3,] 51 172
[4,] 55 174
> apply(X, 2, mean)
[1] 64.51515 174.06061
> # Mean of the first column of X: 64.51515
> # Mean of the second column of X: 174.06061
Econometrics
Plots
The relationship between two quantitative variables
I plot(x,y,...) : defines a scatter plot

I xlab=: Defines the name of the x-axis, needs to be a character
I ylab=: Same for the y-axis
I main=: Defines the title
I type=: Points type: ”p” for points (default), ”l” for lines, ”b” for
points and lines, ”h” for vertical bars, ”n” for ”nothing to show”
(none).
I pch=: Points shape : 1 = circle, 2 = triangle, etc...
I cex=: Size (default=1).
I lty=: Line type : 1 = full line, 2 = dotted, etc...
I lwd=: Line width (default =1).
I col=: Color: 1 = black, 2 = red, etc...
Econometrics
Plots
> plot(data1$weight,data1$height,
+ pch = 1, cex = 1.5, col = 2, main = "Weight and height",
+ xlab = "weight", ylab = "height")
Poids et taille
200
●
190
● ●
●
●
●
● ●
●
● ● ● ●
● ●
180
● ● ● ●
●
● ● ● ● ● ● ●
●
taille
● ● ● ●
● ● ●
●
● ● ● ●
●
170
● ● ●
● ●
● ● ● ●
●
● ●
● ●
● ●
● ●
160
● ●
● ●
150
50 60 70 80
poids
Econometrics
Plots
One quantitative variable

I hist(x,...) : Defines a histogram.
I main=: Title
> hist(data1$weight, main = "Histogramme of weight",
+ xlab = "weight")
Histogramme de la variable poids

15
10
Frequency
5
0
50 60 70 80 90
poids
Econometrics
Plots
I boxplot(x,...) : Defines a box-plot

I main=: Title
> boxplot(data1$weight,main="Box-plot of weight",
+ ylab = "weight")
Boite a moustache de la variable poids

80
70
poids
60
50
Econometrics
Plots
One qualitative variable
I barplot(x,...) : Defines a bar plot starting from the

contingency table x (as input)
I main=: Title
Diagramme en barre de l'état civil
4
> t1 = table(data2$civil_status)
> t1
3
c m v
4 3 1 2
> barplot(t1,main="Bar plot

1
+ of civil status",
+ xlab = "Civil status")
0
c m v
Etat civil
Econometrics
Plots
Two qualitative variables
I mosaicplot(x,...) : Defines a mosaic plot starting from the

contingency table x
I main=: Title
> t2 = table(data2$civil_status,
c m v
+ data2$sex)
> t2
f
f h
c 2 2
Sexe
m 2 1
v 1 0
h
> mosaicplot(t2,main=" ",

+ xlab="Civil_status", Etat civil
+ ylab="Sex")
Econometrics
Basic of programming
How to write your own function
I function( arglist ){expr}

I arglist: A list of inputs
I expr: The commands to be executed
Econometrics
Basic of programming
Write a function to compute the area of a triangle, starting from

its length and width
> my.function = function(l, h)
> # The inputs are the length and width of the triangle
+ {
+ A = (l*h)/2 # The area
+ return(A) # The function returns the output
+ }
> my.function(3,4)
[1] 6
> # Area of a triangle with length 3
> # and width 4.
> my.function(5,3)
[1] 7.5
> # Area of a triangle with length 5
> # and width 3.

Theory 1. R Basics

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Theory 1. R Basics

Uploaded by

Copyright:

Available Formats

Econometrics

I R is an open source programming language and software

n = 15 # Define the object named n

I The basic version of R contains a lot of functions

How to get help

I packages: additional libraries, functions

I Time series (time series):

I c(...) : define a vector by combining elements

How to import a dataset

Download example1.csv and example2.csv in the working directory

> data1 = read.table("example1.txt", header=TRUE)

I Financial time series get.hist.quote(...) : load a time

Install the package tseries

How to extract information from vectors

How to extract information from data frames

I t$variable_name : return the variable (column) variable name

How to extract information from time series

I st[i] : return the i-th element of the time series st

Information about vectors

I rev(x) : reverses the order of the elements

Information about datasets

... (from the previous slide)

I sum(x) : gives the sum of the elements

I var(x) : gives the variance

> x = c(-1, 2, 5, -7)

I cumsum(x) : gives the cumulative sum

> x = c(-1, 2, 5, -7)

I sample(x, n) : Draws a random sample of elements of x of

> x = c(-1, 2, 5, -7)

I rbind(x,y) : combines vectors or dataset by rows

(It follows in the next slide...)

(...From the previous slide)

I apply(X, margin, fun) : Applies a function on the rows or

> X = cbind(data1$weight, data1$height)

The relationship between two quantitative variables

I plot(x,y,...) : defines a scatter plot

One quantitative variable

Histogramme de la variable poids

I boxplot(x,...) : Defines a box-plot

Boite a moustache de la variable poids

One qualitative variable

I barplot(x,...) : Defines a bar plot starting from the

Diagramme en barre de l'état civil

> barplot(t1,main="Bar plot

Two qualitative variables

I mosaicplot(x,...) : Defines a mosaic plot starting from the

> mosaicplot(t2,main=" ",

How to write your own function

I function( arglist ){expr}

Write a function to compute the area of a triangle, starting from

You might also like