Professional Documents
Culture Documents
stat.umn.edu/geyer/5601/examp/intro.html
What is R?
If you have the time and want to know more about R, the
Introduction to R
that comes with the R software is the first thing to
read, but it is way more than you need to know for this course.
What is Rweb?
1/25
Free software is amazing. Creative programmers
can use it to do anything they can think of.
There's no vendor controlling use of the software to protect their profits.
Prof. Jeff Banfield at Montana State University put R on the web. You
can run simple R commands from any computer connected to the internet.
A similar program could be easily done for S-PLUS but would be illegal
because the vendor couldn't profit from it.
There are two "interfaces" to Rweb. The simple one found by clicking on
the Rweb link
on the main Rweb page, is the only one we will explain. It has the virtue
of being embeddable in web pages to make examples.
To see how the example works, just click the "Submit" button.
When you have seen the example, click the "Back" button on your web
browser to return to this page.
For now, don't bother with what the example does. Just notice that
it does some calculations on some data and draws a picture.
Rweb is just R. You type R statements into a web form. You submit
them. They get executed on the server. The results get stuffed into
a web page sent back to your computer. So Rweb is just R run over the web.
2/25
So mostly we will use "R" and "Rweb" interchangeably.
One important difference between Rweb and R is that the server remembers
nothing between Rweb submissions.
The entire calculation you want done must be submitted
to Rweb in one web form.
R run on your own computer
does remember. You can build up a complicated analysis a little bit at
a time.
Thus Rweb is fairly useless for really complicated problems, but is fine
for (simple) coursework.
Like all other computer languages, R has variables, which are referred to
by variable names. Variable names may contain any letter, digit, or the dot
( . ) and cannot begin with a digit. Names are case sensitive,
thus fred , Fred , and FRED refer to
different variables.
fred <- 4
or
sally <- 2 + 2
or
3/25
In order to see any results from R. You have to execute a command that
makes output, the most common being and plot .
sally
Vectors
Not all R variable values are single numbers (in fact most aren't).
Most R variables are vectors, which is R's name for a list of objects
of the same type
(often numbers but character variables and other types are possible).
There are many ways to create vectors in R. Many functions and operators
return vector values if given vector values as arguments. Here we will only
look at a few ways to create a vector and a few functions and operators
that work vectorwise.
4/25
The c Function
The R function c
(on-line help)
"combines" or "collects" all its arguments into one vector,
for example
http://www.stat.umn.edu/geyer/5601/examp/blurfle.txt
This has the result that all of the variables must be vectors
of the same length.
This can usually be arranged somehow
(if necessary pad the variables that are too short with
NA values).
5/25
in it. The example blurfle.txt creates
three variables, color , x , and y
and prints them out.
R Data Entry
The same issues from the preceding section apply when you are not using
Rweb but using R on your own computer. The pattern is a little different.
Suppose you are in R. The following commands
X
mimic what Rweb does when it loads the data in the URL of the preceding
section. After that the statements in the Rweb form in the preceding
section work the same in your computer as they do on Rweb.
(In Rweb you can actually see the two statements shown just above in
each Rweb output page, but the read.table in Rweb does
not read from the web but from an already downloaded copy of the file.)
You can also download the file yourself to your own computer before starting
R and then just do
X
This assumes the file "blurfle.txt" has been downloaded to a directory or
folder where R will look for it (the current working directory under Linux,
the user's Documents folder by default under Windows, or
the user's home directory under MacOS X, these can be changed using
the menus of the R GUI app).
Or you can create your own data file with your own data in it.
This has the result that all of the variables must be vectors
of the same length.
6/25
This can usually be arranged somehow
(if necessary pad the variables that are too short with
NA values).
You cannot use Microsoft Word or any similar product to create data files
for entry into R. Just say no. It cannot be made to work. You will only
frustrate yourself if you try.
If you are careful to save in plain text format, you can use Microsoft Notepad.
http://www.stat.umn.edu/geyer/5601/examp/blurfle.csv
Now
X
reads the data into R.
Indexing Vectors
Integer Indexing
8/25
The simplest form of indexing uses positive integers in the range from
one to the length of the vector. For example
do what is obvious (after you get used to vector indexing). Not quite so
obvious is that subscripts work the same way on the other side of the
assignment operator.
9/25
Negative index values indicate "everything but"
Logical Indexing
bob[bob != 42]
The result of
bob != 42
10/25
is a logical vector (all elements having values TRUE
or FALSE. Indexing with such a vector picks out the elements
for which the index is TRUE.
That's all for this web page. If you need to know more, see
Index vectors; selecting and modifying subsets of a data set in
the Introduction to R document if you need to know more, but don't
look at it your first time through this.
Functions
Built-in Functions
We've already mentioned a few R functions. There are lots and lots of
others. By "built-in" functions, we mean those that you don't have
to do anything special to use. Strictly, speaking R doesn't have any
"built-in" functions. Any function is like any other function.
None are more special than any other. But several "packages" called
base,
datasets,
graphics,
grDevices,
methods,
stats, and
utils
are automatically available
with no special effort.
Arguments
plot(x, y)
Named Arguments
12/25
The named arguments here, main, xlab
and ylab can appear in any order so long as they
are after the unnamed arguments.
If you actually know the order of all the arguments, then you don't
need the name. For example, the three expressions
Your choice.
Packages
library(exactRankTests)
Defining Functions
14/25
As the assignment suggests, an R function is just an R object
like any other R object. As such, it can be assigned a variable name
or used in any other way an R object can be used.
In this example, trim is an R variable
that happens to be a function and x is an R variable
that happens to be a numeric vector.
Returned Value
Thus
Local Variables
15/25
Local variables are variables defined inside a function.
They exist only inside the function and have no influence on anything
outside the function.
Global Variables
16/25
This is sometimes very convenient, but can lead to confusing code.
It probably shouldn't be overused.
(Real
programmers think
global variables
are evil, but they are part of the R way.)
More on Functions
17/25
Any data value, numeric or not, can be NA. This is what
you use for "missing data". Always use NA for this
purpose. Never use 999 or some other code that is actually
a number. Sad experience of many scientists shows this sort of code is
always forgotten at some point and the data analysis thereby ruined.
Inf: Infinity
Control Structures
18/25
R is
a Turing complete
computer language. Anything you can do with a computer, you can do in R
(if you are a sufficiently clever programmer). For those who don't want
to use a computer except via
a WIMP interface
(mice and menus), this may seem irrelevant, but it is very important.
Those who find computer programming frightening may rest assured that the
"programming" we do will be very simple, involving no more than
writing your own functions and the two control structures
described in this section.
For Loops
One thing computers are much better at than people is mindless repetition.
The for control construct
(on-line help)
is the main way mindless repetition is done in R.
The R expression
for (i in 1:100) {
### some R statements that do some work here
}
does the same thing (whatever is done by the R statements inside) 100 times.
19/25
Comments
To understand this one needs to know that each time the loop is executed
the variable i takes a different value from the list specified
in the for statement, which in this case is 1:nsim,
the vector of integers 1, 2, …, nsim.
20/25
The line following the loop draws the histogram: the sampling distribution
of the sample median for a random sample of size n from a normal
population.
The last line adds the density of the asymptotic normal distribution of
this estimator (the asymptotic normal distribution being given in theory
books).
If
Comments
After the loop finishes, some of the values of the vector tstat
are independent random realizations of the null distribution of the test
22/25
statistic when the hypothesis of equality of variances is false (because
sigx and sigy are different.
The code following the loop plots the histogram of the simulation distribution
and the density of the t distribution that the procedure assumes
the test statistic has. Clearly it doesn't.
Else
Ifelse
Logical Indexing
24/25
Another R way to make decisions that has no analog in conventional
computing languages uses logical indexing (which is described in
a section above). Here's how that works.
25/25