You are on page 1of 43

Econometrics

Econometrics - R Basics
Catalin Starica
26th September 2017
Econometrics
First R session
Installation

I R is an open source programming language and software


environment for statistical computing
I Available at CRAN (Comprehensive R Archive Network)
http://cran.r-project.org/
I Additional and user-friendly interface: RStudio
Econometrics
First R session
Documentation

References (1/2)
I http://cran.r-project.org/manuals.html

I See An Introduction to R
http://cran.r-project.org/doc/manuals/R-intro.html
I See the FAQ(s)
http://cran.r-project.org/doc/FAQ/R-FAQ.html
I R: Getting started
http://wiki.math.yorku.ca/index.php/R:_Getting_started
I Introduction to R (Vincent Zoonekynd),
http://zoonek2.free.fr/UNIX/48_R/all.html
I How to make simple graphs
http://www.harding.edu/fmccown/r/
Econometrics
First R session
Documentation

References (2/2)
I In French:
I R pour les débutants (Emmanuel Paradis)
http://cran.r-project.org/doc/contrib/Paradis-
rdebuts_fr.pdf
I Introduction au système R, (Yves Brostaux)
http://cran.r-project.org/doc/contrib/Brostaux-
Introduction-au-R.zip
I In English:
I simpleR - Using R for Introductory Statistics (John Verzani)
http://cran.r-project.org/doc/contrib/Verzani-
SimpleR.pdf
I Book : The R software. Fundamentals of Programming and
Statistical Analysis (Lafaye de Micheaux et al., Springer, 2011)
Econometrics
First R session
First R session

I How to open R
Click on the R icon and everything is ready to work
I The R console
Here you can execute a command instantly, i.e. you type the
formula, you press enter and you receive the output
I Example: (everything after the symbol # is not executed, it is
just a way to add comments):
> 2*(-4.5) # Note that the decimal separator is a point (.)
[1] -9
> 3^2 # 3 to the power of 2
[1] 9
> sqrt(4) # square root of 4
[1] 2
> log(1) # natural logarithm (base e) of 1
[1] 0
Econometrics
First R session
First R session

I How to close R
Type the command q() ou File-> Quit Session... Choose to
do not save a workspace image
I How to set a Working Directory (WD) :
Follow File->Set working directory and set a working directory.
Check: Type getwd() to show the actual working directory
> getwd()
[1] "M:/TravailR"
Econometrics
First R session
First R session

I R script:
You can create a R script to define a sequence of commands
I Create a new script: File -> New script.
I Save a script: File -> Save as... select a directory, and name
the script (e.g. script1.R)
I Save an existing script: File -> Save
I Open an existing script: File -> Open file and choose
I Execute a line of command, right click and Execute, or you can
even push Ctrl + Enter (cmd + Enter on a Mac)
Econometrics
First R session
Define, list and show objects

Exercise 1
I Launch R and define a working directory
I Create a new script with the name script1.R
I Type all the commands shown in the next slide
I Run the obtained script line by line
I Save the script, and then close R
I Open the script: all the codes can be reused
Econometrics
First R session
Define, list and show objects

n = 15 # Define the object named n


n # Show it
n = 5 # Replace n with a new object
n
x = 1
X = 10 # There is a distinction between
# capital and lower-case letters
x
X
r = log(X) + n
r
Econometrics
First R session
Functions and help

I The basic version of R contains a lot of functions


I There is the possibility to extend the number of functions by
installing additional packages, or you can even write a
function for your own
I General syntax:
function_name(parameter1=value1, par2=val2,...)

How to get help


I If you need help to understand a function:
help(function_name)
> help(mean)
I To obtain a list of functions associated to a keyword:
help.search(key_word)
> help.search("median")
Econometrics
First R session
Logical operators

Logical operators
a==b TRUE, if a is equal to b, FALSE otherwise
!(a==b) TRUE, if a is different from b, FALSE otherwise
a!=b TRUE, if a is different from b, FALSE otherwise
a<b TRUE, if a is lower than b, FALSE otherwise
a>b TRUE, if a is greater than b, FALSE otherwise
a<=b TRUE, if a is lower or equal to b, FALSE otherwise
a>=b TRUE, if a is greater or equal to b, FALSE otherwise
a|b TRUE, if a, b, or both are TRUE, FALSE otherwise
a&b TRUE, if a and b are TRUE, FALSE otherwise
Econometrics
First R session
Logical operators

Example

> x=3
> y=4
> z = c(-1,2)
> x>3
[1] FALSE
> x>3 & y==4
[1] FALSE
> z>0
[1] FALSE TRUE
Econometrics
First R session
Installer une extension

I packages: additional libraries, functions


I How to install a package (to do only once):
Example, how to install the package tseries:
Packages-> Install the package(s)-> Switzerland -> tseries
With RStudio is even easier.
I How to load a package (to do every time):
library(tseries)
I How to visualise the functions inside a package:
help(package=tseries)
Econometrics
First R session
Most important objects

I Vectors (vector ):
Formed by a set of elements (numbers, characters)
3 6 5
I Datasets (data frame):
Formed by one or more than one vectors of the same length
The rule: columns ↔ variables / rows ↔ observations
sex weight height
m 60 170
f 57 169
f 51 172
f 55 174
Econometrics
First R session
Most important objects

I Time series (time series):


Series of observations indexed by time
Price
2007-01-03 89.79
2007-02-01 84.42
2007-03-01 85.62
2007-04-02 92.84
2007-05-01 97.21
2007-06-01 95.98
2007-07-02 100.90
Econometrics
Data
How to generate data

I c(...) : define a vector by combining elements


I FROM:TO : define a sequence
I rep(x, n) : repeat n times x.

> c(1,7,0,2)
[1] 1 7 0 2
> 2:7
[1] 2 3 4 5 6 7
> rep(8, times=3)
[1] 8 8 8
Econometrics
Data
How to read data

How to import a dataset


I read.table(file,header=TRUE) : to import a .txt file; the
result is a dataset
I file : the name of the file (the file HAS TO BE in the same
working directory)
I header=TRUE: the first line of the file contains the names of
the variables (otherwise use header=FALSE)
I read.csv(file,header=TRUE) : to import a .csv file; the result
is a dataset
I names(x) : it gives the name of the variables (columns) of the
dataset x
Econometrics
Data
How to read data

Download example1.csv and example2.csv in the working directory

> data1 = read.table("example1.txt", header=TRUE)


> data1 # Show data1
sex weight height
h 60 170
f 57 169
> names(data1)
[1] "sex" "weight" "height"
> data2 = read.csv("example2.csv", header=TRUE)
> data2 # Show data2
sex civil_status number_childs
f c 0
h c 0
> names(data2)
[1] "sex" "civil_status" "number_childs"
Econometrics
Data
How to read data

I Financial time series get.hist.quote(...) : load a time


series of financial data
I instrument : stock name
I start: starting date
I end: end date
I quote: data type
I provider: data provider (e.g. Yahoo)
I compression: reporting frequency, type d for daily, w for
weekly, and m for monthly
(You need to load the package tseries)
Econometrics
Data
How to read data

Install the package tseries


> library(tseries)
> ibm=get.hist.quote(instrument = "IBM",start="2007-01-01",
+ end="2008-09-12", quote = "AdjClose",provider = "yahoo",
+ compression = "m")
> ibm
AdjClose
2007-01-03 89.79
2007-02-01 84.42
2007-03-01 85.62
2007-04-02 92.84
2007-05-01 97.21
2007-06-01 95.98
Econometrics
Data
How to extract information from vectors

How to extract information from vectors


I x[n] : the n-th element of the vector x.
> x=c(3,6,5) # define the vector x
> x
[1] 3 6 5
> x[2] # the second element of x
[1] 6
> x[c(2,3)] # the second and third element of x
[1] 6 5
> x>4
[1] FALSE TRUE TRUE
> x[x>4] # elements greater than 4
[1] 6 5
Econometrics
Data
How to extract information from vectors

How to extract information from data frames

I t$variable_name : return the variable (column) variable name


I t[i,j] : return the element in the i-th row and j-th column
of the dataset t; the results is a number
I t[i,] : return the whole i-th row of the dataset t; the result
is a vector
I t[,j] : return the whole j-th column of the dataset t; the
result is a vector
Econometrics
Data
How to extract information from vectors

Example
> data1
sex weight height
h 60 170
f 57 169
f 51 172
f 55 174
> data1$weight # variable weight of data1
[1] 60 57 51 55 50 50 48 72 52 64 53 72 61
[35] 52 57 53 55 66 65 75 50 53 55 55 72 75
> data1[2,3] # element in the second row, third column
[1] 169
> data1[2,] # second row of data1
sex weight height
f 57 169
> data1[,2] # second column of data1, same as data1$weight
[1] 60 57 51 55 50 50 48 72 52 64 53 72 61
[35] 52 57 53 55 66 65 75 50 53 55 55 72 75
Econometrics
Data
How to extract information from vectors

How to extract information from time series

I st[i] : return the i-th element of the time series st


I as.vector(st) : define st as a vector; so that you can use all
the function made for vectors

> ibm
AdjClose
2007-01-03 89.79
2007-02-01 84.42
2007-03-01 85.62
> ibm[3]
2007-03-01
85.62
> ibm_vect = as.vector(ibm)
> ibm_vect
[1] 89.79 84.42 85.62 92.84 97.21 95.98 100.90
[15] 106.14 111.27 119.80 109.72 118.46 113.12 110.55
Econometrics
Data
How to extract information from vectors

Information about vectors


I length(x) : gives the number of elements inside the vector

I rev(x) : reverses the order of the elements


I sort(x) : sort the element (ascending order)
I unique(x) : eliminates the double elements

> data1$weight
[1] 60 57 51 55 50 50 48 72 52 64 53 72 61 78...
[35] ...52 57 53 55 66 65 75 50 53 55 55 72 75 73
> length(data1$weight)
[1] 66
> rev(data1$weight)
[1] 74 47 65 85 86 82 73 85 67 60 72 73 47 62...
[35] ...70 77 62 70 72 70 73 53 80 74 50 49 62 74
> sort(data1$weight)
[1] 47 47 48 49 50 50 50 50 51 51 51 52 52 53...
[35] ...66 67 68 70 70 70 71 71 72 72 72 72 72 73
> unique(sort(data1$weight))
[1] 47 48 49 50 51 52 53 55 57 60 61 62 64 65...
Econometrics
Data
How to extract information from vectors

Information about datasets


I dim(x) : gives the dimensions.
I nrow(x) : gives the number of rows.
I ncol(x) : gives the number of columns.
I str(x) : describe the type of variables.
I summary(x) : gives basic summary statistics (very few!)
I table(x) : create a contingency table
I prop.table(table(x)) : derives the marginal probabilities from
the contingency table
Econometrics
Data
How to extract information from vectors

> data1
sex weight height
1 m 60 170
2 f 57 169
...
> names(data1)
[1] "sex" "weight" "height"
> dim(data1) # 66 rows, 3 columns
[1] 66 3
> nrow(data1) # 66 rows (observations)
[1] 66
> ncol(data1) # 3 columns (variables)
[1] 3
> str(data1)
’data.frame’: 66 obs. of 3 variables:
$ sex : Factor w/ 2 levels "f","m": 2 1 1 1 1 1 1 2 1 2 ...
$ weight : int 60 57 51 55 50 50 48 72 52 64 ...
$ height: int 170 169 172 174 168 161 162 189 160 175 ...
...(it follows in the next slide)
Econometrics
Data
How to extract information from vectors

... (from the previous slide)


> summary(data1)
sex weight height
f:25 Min. :47.00 Min. :150.0
m:41 1st Qu.:53.00 1st Qu.:168.0
Median :65.50 Median :174.5
Mean :64.52 Mean :174.1
3rd Qu.:73.00 3rd Qu.:180.0
Max. :86.00 Max. :200.0
> table(data1$sex)
f m
25 41
> table(data1$sex, data1$weight)
47 48 49 50 51 52 53 55 57 60 61 62 64 65 66
f 2 1 1 4 3 2 5 2 2 0 0 1 0 0 1
h 0 0 0 0 0 0 0 2 0 2 1 2 1 2 1
Econometrics
Math functions, object combination
Simple math functions

I sum(x) : gives the sum of the elements


I mean(x) : gives the mean of the elements
I median(x) : gives the median of the elements
I sqrt(x) : gives the square root of the elements
I log(x) : gives the natural logarithm of the elements
> data1$height
[1] 170 169 172 174 168 161 162 189 160
> sum(data1$height)
[1] 11488
> mean(data1$height)
[1] 174.0606
> median(data1$height)
[1] 174.5
> sqrt(data1$height)
[1] 13.03840 13.00000 13.11488 13.19091
> log(data1$height)
[1] 5.135798 5.129899 5.147494 5.159055
Econometrics
Math functions, object combination
Simple math functions

I var(x) : gives the variance


I sd(x) : gives the standard deviation
I min(x) : gives the minimum
I max(x) : gives the maximum
I abs(x) : gives the absolute value

> x = c(-1, 2, 5, -7)


> x
[1] -1 2 5 -7
> var(x)
[1] 26.25
> sd(x)
[1] 5.123475
> min(x)
[1] -7
> max(x)
[1] 5
> abs(x)
[1] 1 2 5 7
Econometrics
Math functions, object combination
Simple math functions

I cumsum(x) : gives the cumulative sum


I diff(x) : gives the difference between an element and the
previous one
I union(x,y) : gives the union set
I intersect(x,y) : gives the intersection set

> x = c(-1, 2, 5, -7)


> x
[1] -1 2 5 -7
> y = c(5, 1, 3, 6)
> y
[1] 5 1 3 6
> cumsum(y)
[1] 5 6 9 15
> union(x,y)
[1] -1 2 5 -7 1 3 6
> diff(y)
[1] -4 2 3
Econometrics
Math functions, object combination
Simple math functions

I sample(x, n) : Draws a random sample of elements of x of


size n; add replace=TRUE for a sampling scheme with
replacement
I scale(x) : to standardise data

> x = c(-1, 2, 5, -7)


> y = c(5, 1, 3, 6)
> sample(y, 3)
[1] 3 1 6
> sample(y, 6, replace=TRUE)
[1] 5 5 6 5 3 6
> x2 = scale(x)
Econometrics
Math functions, object combination
Objects combination

I rbind(x,y) : combines vectors or dataset by rows


I cbind(x,y) : combines vectors or dataset by columns

> data2
sex civil_status number_childs
f c 0
h c 0
...
f c 1

(It follows in the next slide...)


Econometrics
Math functions, object combination
Objects combination

(...From the previous slide)


> vary = 1:8
> vary
[1] 1 2 3 4 5 6 7 8
> data2_new = cbind(data2,vary) # We add the variable vary
> data2_new
sex civil_status number_childs vary
f c 0 1
h c 0 2
...
f c 1 8
> obsz = c("f", "c", 2)
> data2_new2 = rbind(data2, obsz) # We add the observation obsz
> data2_new2
sex civil_status number_childs
f c 0
h c 0
...
f c 1
f c 2
Econometrics
Math functions, object combination
Apply a function on the rows or columns of a dataset

I apply(X, margin, fun) : Applies a function on the rows or


columns of a dataset
I X : matrix or dataset
I margin : margin=1 to apply a function on the rows of X and
margin=2 to apply a function on the columns of X.
I fun : the function to be applied

> X = cbind(data1$weight, data1$height)


> X
[,1] [,2]
[1,] 60 170
[2,] 57 169
[3,] 51 172
[4,] 55 174
> apply(X, 2, mean)
[1] 64.51515 174.06061
> # Mean of the first column of X: 64.51515
> # Mean of the second column of X: 174.06061
Econometrics
Plots

The relationship between two quantitative variables

I plot(x,y,...) : defines a scatter plot


I xlab=: Defines the name of the x-axis, needs to be a character
I ylab=: Same for the y-axis
I main=: Defines the title
I type=: Points type: ”p” for points (default), ”l” for lines, ”b” for
points and lines, ”h” for vertical bars, ”n” for ”nothing to show”
(none).
I pch=: Points shape : 1 = circle, 2 = triangle, etc...
I cex=: Size (default=1).
I lty=: Line type : 1 = full line, 2 = dotted, etc...
I lwd=: Line width (default =1).
I col=: Color: 1 = black, 2 = red, etc...
Econometrics
Plots

> plot(data1$weight,data1$height,
+ pch = 1, cex = 1.5, col = 2, main = "Weight and height",
+ xlab = "weight", ylab = "height")

Poids et taille
200


190

● ●



● ●

● ● ● ●
● ●
180

● ● ● ●

● ● ● ● ● ● ●

taille

● ● ● ●
● ● ●

● ● ● ●

170

● ● ●
● ●
● ● ● ●


● ●
● ●
● ●
● ●
160

● ●
● ●
150

50 60 70 80
poids
Econometrics
Plots

One quantitative variable


I hist(x,...) : Defines a histogram.
I xlab=: Defines the name of the x-axis, needs to be a character
I main=: Title
> hist(data1$weight, main = "Histogramme of weight",
+ xlab = "weight")

Histogramme de la variable poids


15
10
Frequency
5
0

50 60 70 80 90
poids
Econometrics
Plots

I boxplot(x,...) : Defines a box-plot


I xlab=: Defines the name of the x-axis, needs to be a character
I ylab=: Same for the y-axis
I main=: Title
> boxplot(data1$weight,main="Box-plot of weight",
+ ylab = "weight")

Boite a moustache de la variable poids


80
70
poids
60
50
Econometrics
Plots

One qualitative variable

I barplot(x,...) : Defines a bar plot starting from the


contingency table x (as input)
I xlab=: Defines the name of the x-axis, needs to be a character
I ylab=: Same for the y-axis
I main=: Title

Diagramme en barre de l'état civil

4
> t1 = table(data2$civil_status)
> t1

3
c m v
4 3 1 2

> barplot(t1,main="Bar plot


1

+ of civil status",
+ xlab = "Civil status")
0

c m v
Etat civil
Econometrics
Plots

Two qualitative variables

I mosaicplot(x,...) : Defines a mosaic plot starting from the


contingency table x
I xlab=: Defines the name of the x-axis, needs to be a character
I ylab=: Same for the y-axis
I main=: Title

> t2 = table(data2$civil_status,
c m v

+ data2$sex)
> t2

f
f h
c 2 2
Sexe
m 2 1
v 1 0
h

> mosaicplot(t2,main=" ",


+ xlab="Civil_status", Etat civil

+ ylab="Sex")
Econometrics
Basic of programming

How to write your own function

I function( arglist ){expr}


I arglist: A list of inputs
I expr: The commands to be executed
Econometrics
Basic of programming

Write a function to compute the area of a triangle, starting from


its length and width
> my.function = function(l, h)
> # The inputs are the length and width of the triangle
+ {
+ A = (l*h)/2 # The area
+ return(A) # The function returns the output
+ }
> my.function(3,4)
[1] 6
> # Area of a triangle with length 3
> # and width 4.
> my.function(5,3)
[1] 7.5
> # Area of a triangle with length 5
> # and width 3.

You might also like