You are on page 1of 2

Terrel Harper

D658

R tutorial

1. How to create data set?


Creating a data set is very easy in R. There are 2 functions that
can be used for this; c() and scan().
Lets say we have the data set 1-5. Pick a variable name, lets call
it S1 for set 1. We can use “S1 <- c(1,2,3,4,5)” to create a data
set. We can also use “S1 = c(1,2,3,4,5)”. Now S1 = 1,2,3,4. It will
do the same thing. For scan(), we do “S1 = scan()”. Then it will
start to ask for us values. We put them in one after another.
When we are done, it will ask for one more. Just hit enter again
and that will end it.

2. How do you construct a frequency table of a data set? How do you


present the data set as a pie chart, as a histogram, and as a stem-
and-leaf plot?
To create a frequency table, first make a data set. S1 =
c(2,2,3,3,4,4,4,5) then type the command table(). “table(S1)” It
will create a basic frequency table. It will show “2 3 4 5”
2 2 3 1 The top is your data and the bottom is how often
they come.
To make a pie chart, use pie(). This is make a basic pie chart. To get
fancier, use pie(number/data variable, labels = letter variable for
labels, main = title of the chart)
S1 = c(2,3,4), L1 = c(“label 1”, “label 2”, “label 3”)
pie(S1, L1, main = “name of pie chart)
you have the option of not making a variable for labels and just do
“labels = “label 1”, “label 2”) but a variable will be cleaner.

The most basic way to make a histogram is to use hist(). hist(S1) will
make a histogram of S1. You can make fancier by adding main and
xlab = “ ” to add new names and labels for the graph. col = “ ” will
change the color.

To make a simple stem and leave plot, use stem()

3. For a given data set, how do you compute the mean, median,
variance and standard deviation?

To calculate the mean, use mean(). It will give to the answer with
many decimals. Use the use trim = so short it. S1 = c(2,3,4)
mean(S1) will give us 3, but if it were a lot of numbers or a bunch
of prime numbers that didn’t go well together, you could use
mean(S1, trim = 0.5) to round it. Trim only reads from 0 – 0.5.

To calculate the median, just type median(), like median(S1), and


it will find it for you. For the variance, the command is simply
var() and for standard deviation, the command is simply, sd().

4. For a given data set, how do you find the critical five numbers
(minimum value, the first quartile, the second quartile (median), the
third quartile, maximum value) of the data set and how do you use a
box plot to present the five numbers?

Using the summary() command, you can actually bring up 6


numbers: the min, first quart, median, mean, third quart, and
max. You can even assign those to a variable. Sample =
summary(S1). If summary() will specific which values are what,
you can take out the 5 you’re looking for and put those in
another variable. Then simply use boxplot() to create a simple
plot.

You might also like