You are on page 1of 2

R-cheatsheet 1

Help Numerical summaries Linear regression


These functions from the mosaic-package uses a for-
? mula syntax. model <- lm() # fit linear model
?? summary(model) # model fit summary
example() favstats() # min/max, median, etc. coef(model) # estimated parameters
apropos() tally() # tabulate data confint(model) # CI for estimates
help.search() mean() anova(model) # F-tests, etc.
median() drop1(model)
sd() # standard deviation rstudent(model) # Studentized residuals
cor() # correlation fitted(model) # fitted values
Packages plotModel(model)# plot regression lines
In order to use functionalities from a certain package,
we need to first install and then load the package: General graphics model <- glm() # generalized linear model

# install package (do once):


gf_boxplot()
install.packages("") Data
gf_point() # scatter plot
gf_histogram()
# load package (do once in every script):
gf_bar() # bar graph # Load data:
require(); library()
read.file(); read.delim(); read.csv()
mplot(HELPrct)# different plots
splom() # matrix of scatter plots # Data information
Formula syntax nrow(); ncol() # data dimensions
Most of the functions that we need for this course uses head() # extract first part of data
a formula syntax: Distributions tail() # extract last part of data
colnames() # column names
goal(y ~ x | z, data = mydata, ...) rownames() # row names
plotDist() # plot theoretical distribution
summary()
where goal may be a function for plotting, calculat- pdist() # find prob. from percentile
ing numerical summaries or making inference. qdist() # find percentile from prob.
# Alter/create data:
For plots: subset() # subset data by condition
Hypothesis tests factor() # create grouping variable
• y is y-axis variable relevel() # change reference level
cut() # cut numeric into intervals
• x is x-axis variable t.test() # t-test round() # rounding numbers
binom.test() # binomial (exact) test c() # concatenate numerics
• z a conditioning variable (separate panels). prop.test() # approximate test seq() # create sequence
fisher.test() # Fisher's exact test with()
For other things:
cor.test() # correlation test aggregate()
‘y ~ x | z’ can usually be read ‘y is modeled by (or chisq.test() # chi-square test margin.table() # sum table entries
depends on) x differently for each z’.
Examples 2

The following are examples of how some of the func- # Make a scatterplot of 'pcs' versus 'age' # Investigate whether we may drop 'age' as an
tions work (based on mosaic’s built-in data set, # coloured by 'sex' and add regression lines: # explanotary variable for 'pcs', when
HELPrct). We assume mosaic is already loaded. In gf_point(pcs~age, col = ~sex, data = HELPrct) %>% # 'substance' is in the linear model too:
some chunks only the code and not the output is gf_lm() %>% mod1 <- lm(pcs ~ age + substance, data = HELPrct)
shown (to see the output, copy-paste the code chunk gf_labs(x = "Age", mod2 <- lm(pcs ~ substance, data = HELPrct)
of interest into your console). y = "Physical score", anova(mod1, mod2)
title = "My first scatter plot")
# Create a contingency table for 'sex' and Analysis of Variance Table
# 'substance':
tally(sex ~ substance, data = HELPrct) My first scatter plot Model 1: pcs ~ age + substance
Model 2: pcs ~ substance

Physical score
substance 60 sex Res.Df RSS Df Sum of Sq F Pr(>F)
sex alcohol cocaine heroin female 1 449 47517
40
female 36 41 30 male 2 450 50139 -1 -2623 24.8 9.2e-07
male 141 111 94 20

20 30 40 50 60
Illustration of how the functions pdist and qdist
# Calculate mean 'age' for men and women: Age works:
mean(age ~ sex, data = HELPrct) # Calculate the 95th percentile for the
Note: gf point creates the scatter plot, gf lm adds
regression lines and gf labs adds a title and change # standard normal distribution (i.e., mean = 0
female male # and standard deviation = 1):
36.25 35.47 axis labels.
qdist("norm", p = 0.95, mean = 0, sd = 1)

# 'favstats' can be used to retrieve different # Use an exact binomial test to test whether [1] 1.645
# summaries of the data (here for 'age' # the proportion of women is 50 %:
0.4
# separated by sex) : binom.test(~sex, p = 0.5, data = HELPrct)
favstats(age ~ sex, data = HELPrct) 0.3 probability

density
0.2 A: 0.950
sex min Q1 median Q3 max mean sd B: 0.050
1 female 21 31 35 40.5 58 36.25 7.585 # Use a t-test to test whether the mean age of 0.1

2 male 19 30 35 40.0 60 35.47 7.750 # men and women are the same:
0.0
n missing t.test(age ~ sex, data = HELPrct) −2 0 2
1 107 0 # Calculate the probability of getting a value
2 346 0 # less than -1.5 for the standard normal
# Use a chi-square test to test for # distribution:
# Boxplot of 'age' for each substance with # independence between 'homeless' and 'sex': pdist("norm", q = -1.5, mean = 0, sd = 1)
# different panels for men and women: tab <- tally(homeless ~ sex, data = HELPrct)
gf_boxplot(age ~ substance | sex, data = HELPrct) [1] 0.06681
chisq.test(tab)
0.4
female male
60
# Use an approximate test to see whether the
0.3 probability
# proportion of homeless is the same for men

density
50
# and women: 0.2 A: 0.067
age

40 B: 0.933
prop.test(homeless ~ sex, data = HELPrct) 0.1
30

20 0.0
alcohol cocaine heroin alcohol cocaine heroin −2 0 2
substance

You might also like