R-Cheatsheet: Help Numerical Summaries Linear Regression

R-cheatsheet 1
Help Numerical summaries Linear regression

These functions from the mosaic-package uses a for-
? mula syntax. model <- lm() # fit linear model
?? summary(model) # model fit summary
example() favstats() # min/max, median, etc. coef(model) # estimated parameters
apropos() tally() # tabulate data confint(model) # CI for estimates
help.search() mean() anova(model) # F-tests, etc.
median() drop1(model)
sd() # standard deviation rstudent(model) # Studentized residuals
cor() # correlation fitted(model) # fitted values
Packages plotModel(model)# plot regression lines
In order to use functionalities from a certain package,
we need to first install and then load the package: General graphics model <- glm() # generalized linear model
# install package (do once):

gf_boxplot()
install.packages("") Data
gf_point() # scatter plot
gf_histogram()
# load package (do once in every script):
gf_bar() # bar graph # Load data:
require(); library()
read.file(); read.delim(); read.csv()
mplot(HELPrct)# different plots
splom() # matrix of scatter plots # Data information
Formula syntax nrow(); ncol() # data dimensions
Most of the functions that we need for this course uses head() # extract first part of data
a formula syntax: Distributions tail() # extract last part of data
colnames() # column names
goal(y ~ x | z, data = mydata, ...) rownames() # row names
plotDist() # plot theoretical distribution
summary()
where goal may be a function for plotting, calculat- pdist() # find prob. from percentile
ing numerical summaries or making inference. qdist() # find percentile from prob.
# Alter/create data:
For plots: subset() # subset data by condition
Hypothesis tests factor() # create grouping variable
• y is y-axis variable relevel() # change reference level
cut() # cut numeric into intervals
• x is x-axis variable t.test() # t-test round() # rounding numbers
binom.test() # binomial (exact) test c() # concatenate numerics
• z a conditioning variable (separate panels). prop.test() # approximate test seq() # create sequence
fisher.test() # Fisher's exact test with()
For other things:
cor.test() # correlation test aggregate()
‘y ~ x | z’ can usually be read ‘y is modeled by (or chisq.test() # chi-square test margin.table() # sum table entries
depends on) x differently for each z’.
Examples 2
The following are examples of how some of the func- # Make a scatterplot of 'pcs' versus 'age' # Investigate whether we may drop 'age' as an
tions work (based on mosaic’s built-in data set, # coloured by 'sex' and add regression lines: # explanotary variable for 'pcs', when
HELPrct). We assume mosaic is already loaded. In gf_point(pcs~age, col = ~sex, data = HELPrct) %>% # 'substance' is in the linear model too:
some chunks only the code and not the output is gf_lm() %>% mod1 <- lm(pcs ~ age + substance, data = HELPrct)
shown (to see the output, copy-paste the code chunk gf_labs(x = "Age", mod2 <- lm(pcs ~ substance, data = HELPrct)
of interest into your console). y = "Physical score", anova(mod1, mod2)
title = "My first scatter plot")
# Create a contingency table for 'sex' and Analysis of Variance Table
# 'substance':
tally(sex ~ substance, data = HELPrct) My first scatter plot Model 1: pcs ~ age + substance
Model 2: pcs ~ substance
Physical score
substance 60 sex Res.Df RSS Df Sum of Sq F Pr(>F)
sex alcohol cocaine heroin female 1 449 47517
40
female 36 41 30 male 2 450 50139 -1 -2623 24.8 9.2e-07
male 141 111 94 20
20 30 40 50 60
Illustration of how the functions pdist and qdist
# Calculate mean 'age' for men and women: Age works:
mean(age ~ sex, data = HELPrct) # Calculate the 95th percentile for the
Note: gf point creates the scatter plot, gf lm adds
regression lines and gf labs adds a title and change # standard normal distribution (i.e., mean = 0
female male # and standard deviation = 1):
36.25 35.47 axis labels.
qdist("norm", p = 0.95, mean = 0, sd = 1)
# 'favstats' can be used to retrieve different # Use an exact binomial test to test whether [1] 1.645
# summaries of the data (here for 'age' # the proportion of women is 50 %:
0.4
# separated by sex) : binom.test(~sex, p = 0.5, data = HELPrct)
favstats(age ~ sex, data = HELPrct) 0.3 probability
density
0.2 A: 0.950
sex min Q1 median Q3 max mean sd B: 0.050
1 female 21 31 35 40.5 58 36.25 7.585 # Use a t-test to test whether the mean age of 0.1
2 male 19 30 35 40.0 60 35.47 7.750 # men and women are the same:
0.0
n missing t.test(age ~ sex, data = HELPrct) −2 0 2
1 107 0 # Calculate the probability of getting a value
2 346 0 # less than -1.5 for the standard normal
# Use a chi-square test to test for # distribution:
# Boxplot of 'age' for each substance with # independence between 'homeless' and 'sex': pdist("norm", q = -1.5, mean = 0, sd = 1)
# different panels for men and women: tab <- tally(homeless ~ sex, data = HELPrct)
gf_boxplot(age ~ substance | sex, data = HELPrct) [1] 0.06681
chisq.test(tab)
0.4
female male
60
# Use an approximate test to see whether the
0.3 probability
# proportion of homeless is the same for men
density
50
# and women: 0.2 A: 0.067
age
40 B: 0.933
prop.test(homeless ~ sex, data = HELPrct) 0.1
30
20 0.0
alcohol cocaine heroin alcohol cocaine heroin −2 0 2
substance

R-Cheatsheet: Help Numerical Summaries Linear Regression

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

R-Cheatsheet: Help Numerical Summaries Linear Regression

Uploaded by

Copyright:

Available Formats

R-cheatsheet 1

Help Numerical summaries Linear regression

# install package (do once):

You might also like