You are on page 1of 6

R LAB - EXPLORING DATA

Q1. #Use the dim() function on mtcars

Answer:- dim(mtcars) 

Data Structure
Q.2 According to R, what type of variable is am?
Answer factor

 
Levels
Q.3 # Look at the levels of the variable am
Answer  levels(mtcars$am)
 
Recoding Variables
Q #Assign the value of mtcars to the new variable mtcars2
mtcars2 <- mtcars
Q #Assign the label "high" to mpgcategory where mpg is greater than or equal
to 20
mtcars2$mpgcategory[mtcars2$mpg >= 20] <- "high"
Q #Assign the label "low" to mpgcategory where mpg is less than 20
mtcars2$mpgcategory[mtcars2$mpg < 20] <- "low"
Q #Assign mpgcategory as factor to mpgfactor
mtcars2$mpgfactor <- as.factor(mtcars2$mpgcategory)

Examining Frequencies
Q #How many of the cars have a manual transmission?
13

Cumulative Frequency
Q # What percentage of cars have 3 or 5 gears?
62.5

Making a Bar Graph


Q #Assign the frequency of the mtcars variable "am" to a variable called
"height"
height <- table(mtcars$am)
Q #Create a barplot of "height"
barplot(height)

Labelling A Bar Graph


Q # vector of bar heights
height <- table(mtcars$am)
Q # Make a vector of the names of the bars called "barnames"
barnames <- c("automatic", "manual")
Q # Label the y axis "number of cars" and label the bars using barnames
barplot(height, ylab = "number of cars", names.arg = barnames)

Interpreting A Bar Graph


Q Based on the bar chart of transmission type that you made in the previous
exercise, which type of transmission is most common? (remember, 0 =
automatic, 1 = manual)
automatic
Histograms
Q # Make a histogram of the carb variable from the mtcars data set. Set
the title to "Carburetors"
hist(mtcars$carb, main = "Carburetors")

Formatting Your Histogram


Q # arguments to change the y-axis scale to 0 - 20, label the x-axis and
colour the bars red
hist(mtcars$carb, main = "Carburetors", ylim = c(0,20), xlab = "Number
of Carburetors", col = "red")

Bar Graph vs. Histogram


Bar Graph vs. Histogram

50xp

Why did we make a bar graph of transmission (mtcars$am), but a


histogram of carburetors (mtcars$carb)

Possible Answers
Because transmission is categorical, and carb is continuous

Distributions

50xp

Take a look at the distributions in these histograms. Which of the


following is correct?

Possible Answers
Graph 1 is left skewed, graph 2 is normally distributed, graph 3 is right
skewed.

Mean and Median


Q # Calculate the mean miles per gallon
mean(mtcars$mpg)
Q # Calculate the median miles per gallon
median(mtcars$mpg)

Mode
# Produce a sorted frequency table of `carb` from `mtcars`
sort(table(mtcars$carb), decreasing = TRUE)

Range
# Minimum value
x <- min(mtcars$mpg)
# Maximum value
y <- max(mtcars$mpg)
# Calculate the range of mpg using x and y
y–x

Quartiles
Q # What is the value of the second quartile?
17.7100
Q # What is the value of the first quartile?
16.8925

IQR and boxplot


Q # Make a boxplot of qsec
boxplot(mtcars$qsec)
Q # Calculate the interquartile range of qsec
IQR(mtcars$qsec)

IQR outliers
Q # What is the threshold value for an outlier below the first quartile?
13.88125
Q # What is the threshold value for an outlier above the third quartile?
21.91125

Standard Deviation
Q # Find the IQR of horsepower
IQR(mtcars$hp)
Q # Find the standard deviation of horsepower
sd(mtcars$hp)
Q # Find the IQR of miles per gallon
IQR(mtcars$mpg)
Q # Find the standard deviation of miles per gallon
sd(mtcars$mpg)
Mean, median and mode.
50xp

Mean, median and mode are all measures of the average. In a perfect normal
distribution the mean, median and mode values are identical, but when the data is
skewed this changes. In the the graph on the right which of the following
statements are most accurate?

The mode is higher than the mean. It makes most sense to use the
median to measure central tendency.

Calculating Z-scores
# Calculate the z-scores of mpg
(mtcars$mpg - mean(mtcars$mpg)) / sd(mtcars$mpg)
Distributions And Z-scores
50xp

In the distribution shown on the right, what percentage of data will fall between
the z-scores of -2 and 2?

95 %

Z-score Outliers
50xp

Outside of which boundaries might an observation be considered an outlier?

-3 and 3

You might also like