You are on page 1of 22

ECON6067

Computation and Analysis of Economic Data

R (I)

Karen Xiaoting Mai

Fall 2022
R and RStudio

▶ R: a programming language for statistical computing and


graphics. Open source and free
▶ RStudio: an Integrated Development Environment (IDE) for R
▶ Install R and then RStudio
▶ R
▶ https://www.r-project.org
▶ RStudio (choose the Free version)
▶ https://www.rstudio.com/products/rstudio/download/
Interface

▶ Console
▶ Code Editor
▶ Environment: what RStudio has loaded in memory
▶ Bottom right
▶ Files: working directory
▶ Plot: can save plot
▶ Packages: what packages have been installed, what packages
have been loaded in memory
▶ Help
Interface

▶ Can run code directly in Console: Enter


▶ Run code from code editor:
▶ “Run” button
▶ Ctrl (Command) + Enter
▶ R is case-sensitive
▶ Adding comment:
▶ add “#”
▶ select codes and Ctrl(Command)+Shift+C
Change Working Directory

▶ Query
▶ getwd()
▶ Change your working directory
▶ “Session” =>
▶ “Set Working Directory”
▶ “To Source File Loaction”: set to be the same as the opened
script
▶ “To File Pane Location”
▶ setwd("~/Desktop/R Files")
Getting Help

▶ help()
▶ Help on packages
▶ help(plot)
▶ Help on dataset
▶ help(cars)
▶ Search through R documentation
▶ ??plot
▶ Examples:
▶ example("lm")
Objects

▶ Types of variables
▶ Numeric
▶ Integer
▶ Character
▶ Factor
▶ Logical
▶ Create objects
Indexing

▶ a[2] <- 8
▶ a[1:2]
▶ a[c(1,2)]
▶ a[c(TRUE, TRUE, FALSE)]
Dataframe

▶ Stick vectors together to form matrices


▶ x <- cbind(a,b): sticks columns together
▶ y <- rbind(a,b): sticks rows together
▶ Turn the matrix into a dataframe
▶ x <- as.data.frame(x)
▶ Don’t have to do that much indexing, can use $ followed by
variable name
▶ x$a
▶ x$b[2]
Install and Uninstall Packages

▶ Install
▶ “Packages” => “Install”
▶ install.packages("foreign")
▶ Load
▶ check the box in front of package name
▶ library(foreign)
▶ Update packages: install again
Importing Data
▶ Menu:
▶ “File” => “Import Dataset”
▶ “Environment” => “Import Datase”
▶ Code:
▶ data sets in loaded packages:
▶ data(cars)
▶ csv:
▶ dfname <- read.csv("filename.csv")
▶ Excel:
▶ library(readxl)
▶ dfname <- read_excel("filename", sheet = "Sheetname")
▶ If not use first row as vairalbe name:
dfnanme <- read_excel("filename", sheet = "Sheetname",
col_names = FALSE)
▶ dta:
▶ packages “foreign” / “readstata13” depending on Stata version
▶ dfname <- read.dta("filename.dta")
▶ dfname <- read.dta13("filename.dta")
Removing Objects

▶ List objects in Environment


▶ ls()
▶ Remove objects from Environment: rm() or remove()
▶ rm(pwt100_1,a)
▶ Remove all objects with names starting with the same letters
▶ rm(list = ls(pattern = "pwt"))
▶ Remove all
▶ rm(list=ls())
▶ rm(list=ls(all=TRUE))
Renaming

▶ Display names of variables in data set


▶ names(pwt100)
▶ Rename variables
▶ names(dfname)[3] <- "newname"
Summary Statistics

▶ mean(), sd(), var(), ...


▶ symmary
▶ summarise (or: summarize) (in “dplyr”)
▶ stargazer (in “stargazer”)
▶ Produces well-formatted summary statistics and regression
tables
▶ Feed it with a data frame
▶ Type:
▶ “latex(default)”: LATEX code
▶ “html”: HTML/CSS code
▶ “text”: ASCII text
Summary Statistics

▶ table(): cross tabulation


▶ table(auto$rep78)
▶ table(auto$rep78,auto$foreign)
T-Test

▶ One-sample t-test
▶ t.test(auto$price)
▶ t.test(auto$price-6000)
▶ Two-sample t-test: difference in means between two groups
▶ t.test(Variable ~ GroupVar, data=dfname, var.equal=FALSE)
▶ Default: allowing unequal variances(var.equal = FALSE)
▶ Paired t-test
▶ t.test(bpbefore, bpafter, paired=TRUE)
▶ t.test(bpbefore - bpafter)
Plot

▶ plot(x, y)
▶ Common arguments
▶ type: type of plot desired
▶ "p" for points
▶ "l" for lines
▶ "b" for both points and lines
▶ xlab, ylab: labels for the x, y axes
▶ main: title for the plot
▶ sub: subtitle for the plot
▶ xlim, ylim
▶ Add straight line to a plot:
▶ abline(intercept, slope)
▶ abline(0,1)
Linear Regression
Specifications

▶ Basic
▶ lm(y~x1+x2+x3, data=dfname)
▶ Without constant
▶ lm(y~ -1+x1+x2+x3, data=dfname)
▶ Include squared term
▶ lm(y~x1+I(x1^2)+x2+x3, data=dfname)
▶ Incude interaction term
▶ lm(y~x1*x2+x3, data=dfname)
▶ lm(y~x1:x2+x3, data=dfname)
▶ Incude dummies
▶ lm(y~factor(x1)+x2+x3, data=dfname)
Linear Regression
Robust or Clustered Standard Errors

▶ sandwich, lmtest
▶ Heteroscedasticity-robust
▶ coeftest(model10,vcovHC)
▶ Cluster-robust
▶ coeftest(model10,vcovCL,cluster=auto$foreign)
Linear Regression
F-Test

▶ Package “car”
▶ linearHypothesis(model10,"mpg=0",type=c("F"))
▶ linearHypothesis(model10,c("mpg=0","weight=0"),type=c("F"))
Missing Values

▶ Some will report error if variables have missing values


▶ mean(auto$rep78)
▶ Many have arguments to specify how to deal with missing
values
▶ na.rm = TRUE
▶ na.omit = TRUE
▶ mean(auto$rep78, na.rm = TRUE)
▶ is.na(X)=1 if X missing, 0 if X not missing
▶ !is.na(X)=0 if X missing, 1 if X not missing
R Markdown

▶ Run code as individual chunks or as an entire document


▶ Codes, results, narrative text, plots, tables in a single
document.
▶ Render (“Knit”) to different formats (e.g., HTML, PDF, MS
Word, or MS Powerpoint)

You might also like