You are on page 1of 5

 Introduction to Linear Regression Analysis, 5th Edition

PREV NEXT
⏮ ⏭
APPENDIX D: INTRODUCTION TO SAS REFERENCES
  🔎

APPENDIX E

INTRODUCTION TO R TO PERFORM LINEAR REGRESSION


ANALYSIS
R is a popular statistical software package, primarily because it is freely available
at www.r-project.org (http://www.r-project.org). As a result, many instructors as well as
many of the more sophisticated statistical practitioners are switching to it. We
have found that using R makes sense with graduate students who are already
familiar with statistical methodology, especially those students with some
experience using more sophisticated statistical software packages such as SAS.
We personally recommend using less sophisticated and fully supported statistical
software packages such as Minitab and SAS-JMP for undergraduates and those
new to formal statistical analysis. However, we realize that some instructors
prefer to use R even for these less sophisticated students. As a result, we created
this appendix to introduce some of the basics of R.

E.1 Basic Background on R


According to the project's webpage:

1. The R Foundation is a not-for-profit organization working in the


public interest. It has been founded by the members of the R
Development Core Team in order to

1. Provide support for the R project and other innovations in


statistical computing. We believe that R has become a mature
and valuable tool and we would like to ensure its continued
development and the development of future innovations in
Find answers on the fly, or master
software for statisticalsomething new. Subscribe today. See
and computational research.

pricing options.
2. Provide a reference point for individuals, instititutions or
commercial enterprises that want to support or interact with
the R development community.

3. Hold and administer the copyright of R software and


documentation.

2. R is an official part of the Free Software Foundation's GNU project,


and the R Foundation has similar goals to other open source
software foundations like the Apache Foundation or the GNOME
Foundation.
3. Among the goals of the R Foundation are the support of continued
development of R, the exploration of new methodology, teaching
and training of statistical computing and the organization of
meetings and conferences with a statistical computing orientation.
We hope to attract sufficient funding to make these goals realities.

R is a very sophisticated statistical software environment, even though it is freely


available. The contributors include many of the top researchers in statistical
computing. In many ways, it reflects the very latest statistical methodologies. On
the other hand, the contributors truly form a community that is quite fluid. It can
take quite a bit of work to keep current with the latest features of R. The help
documentation with the basic releases is really of limited value. Of course, it
many ways, you get what you pay for!

R itself is a high-level programming language. Most of its commands are pre-


written functions. It does have the ability to run loops and call other routines, for
example, in C. Since it is primarily a programming language, it often presents
challenges to novice users.

E.2 Basic Data Entry


The best way to understand R is through examples. We present here some of the
R code illustrated through the text. We can illustrate many of the basic features of
basic data entry and data manipulation with the vapor pressure data set in
Exercise 5.2. The data are:

Temp vp

273 4.6

283 9.2

293 17.5

303 31.8

313 55.3

323 92.5

333 149.4

343 233.7

353 355.1

Find answers on the


363 fly, or master something new. Subscribe today. See
525.8

373 760.0 pricing options.


The brute force way to enter the data uses the c() function:

temp <- c(273, 283, 293, 303, 313, 323, 333, 343, 353, 363, 373)
vp <- c(4.6, 9.2, 17.5, 31.8, 55.3, 92.5, 149.4, 233.7, 355.1, 52

To check your data entry, you can use the print() function. In our case,

print(temp)
print(vp)

The resulting output is:

> print(temp)
[1] 273 283 293 303 313 323 333 343 353 363 373
> print(vp)
[1] ?4.6 ?9.2 ?17.5 ?31.8 ?55.3 ?92.5 149.4 233.7 355.1 525.8 7

For small data sets, the brute force approach works well. For larger data sets, we
recommend using the read.table() function. You can create a text file with the
data in columns. Generally, the first row is a “header” giving the variable names.
The read.table() function works well for this type of file. Let vapor.txt be such a
file for the vapor pressure data. The first step is to change the working directory
for R to the directory that contains the data file. You can do this under the File
box. The following command reads the data file and places the data into the
object vapor.

vapor <- read.table(“vapor.txt”, header=TRUE, sep=””)

To check the contents of vapor, we can use the print() function. The resulting
output is:

> print(vapor)
temp vp

1 273 4.6
2 283 9.2
3 293 17.5
4 303 31.8
5 313 55.3
6 323 92.5
7 333 149.4
8 343 233.7
9 353 355.1
10 363 525.8
11 373 760.0

If we read the data from a file, then we cannot refer to the temperatures as temp
even though temp was the name of the column in the original data file; rather, we
must also specify the object that contains it. The following command prints the
temp column of the vapor object.

Find answers on the fly, or master something new. Subscribe today. See
> print(vapor$temp)
[1] 273 283 293 pricing options.
303 313 323 333 343 353 363 373
Basic physical chemistry suggests modeling the natural log of the vapor pressure
as a linear function of the inverse of the temperature. The following commands
create the inverse of the temperatures and then prints them.

> inv_temp <- 1/vapor$temp


> print(inv_temp)
[1] 0.003663004 0.003533569 0.003412969 0.003300330 0.003194888
[7] 0.003003003 0.002915452 0.002832861 0.002754821 0.002680965

The log() function genrates the natural log. The following commands create the
natural log of the vapor pressures and then prints them.

> log\_vp <- log(vapor$vp)


> print(log_vp)
[1] 1.526056 2.219203 2.862201 3.459466 4.012773 4.527209 5.0066
[9] 5.872399 6.264921 6.633318

Another useful command for regression analysis is the sqrt() function, which
works exactly like the log() function.

R does generate plots, but it takes a great deal of work to make good looking
plots. The basic plot function is plot(y,x) where y is the object on the y-axis and x
is the object on the y-axis. The following command generates the scatter plot for
the vapor pressure data.

> plot(vapor$vp,vapor$temp)

The write.table() function generates an output data file that is useful for using
other plotting software. The following code appends the inverse temperatures and
the natural logs of the vapor pressures to the original data to form a new object
vapor2 and then creates the output data file vapor_output.txt.

> vapor2 <- cbind(vapor,inv_temp,log\_vp)


> write.table(vapor2,”vapor\_output.txt”)

E.3 Brief Comments on Other Functionality in R


R does a very nice job manipulating matrices. This textbook, however, uses
statistical software to perform the matrix calculations “under the hood,” so to
speak. The text does show the matrix formulations of the procedures we discuss.
However, we do not expect students to perform these calculations directly. As a
result, we consider an introduction to the matrix manipulations within R beyond
our scope. As appropriate, the text does give the basic R code to perform
analyses. We leave it to the course instructor to present the details of the matrix
manipulations within R.

E.4 R Commander
R Commander is an add-on package to R. It also is freely available. It provides
an easy-to-use user interface, much like Minitab and JMP, to the parent R
product. R Commander makes it much more convenient to use R; however, it
Find answers ondoesthe fly, much
not provide or master
flexibility in itssomething new.
analysis. For example, Subscribe today. See
R Commander
does not allow the user to use the externally studentized residual for the residual
pricing options.
plots. R Commander is a good way for users to get familiar with R. Ultimately,
however, we recommend the use of the parent R product.

Settings / Support / Sign Out


© 2020 O'Reilly Media, Inc. Terms of Service / Privacy Policy
PREV NEXT
⏮ ⏭
APPENDIX D: INTRODUCTION TO SAS REFERENCES

Find answers on the fly, or master something new. Subscribe today. See
pricing options.

You might also like