Professional Documents
Culture Documents
Tutorial 2 Solution
1. Forced Expiratory Volume (FEV) is an index of pulmonary function that measures the volume of air
expelled after 1 second of constant effort. The dataset FEV.csv on LumiNUS contains measurements
for 654 children aged 3 to 19 years of age. The purpose of the data collection was to study how FEV
is affected by certain other variables. The variables that we shall work with are
Age: Age in years.
FEV: FEV measurement.
Hgt: Height in inches.
height: Height in meters
Sex: 0 = female, 1 = male.
Smoking status: 0 = current non-smoker, 1 = current smoker.
> attach(fev)
1
ST1131
[1] 9
> #information of all the outliers:
> fev[c(index),]
ID Age FEV Hgt Sex Smoke height
321 2142 14 4.84 72 1 0 1.83
452 33041 12 5.22 70 1 0 1.78
464 37241 13 4.88 73 1 0 1.85
517 49541 13 5.08 74 1 0 1.88
609 6144 19 5.10 72 1 0 1.83
624 25941 15 5.79 69 1 0 1.75
632 37441 17 5.63 73 1 0 1.85
648 71141 17 5.64 70 1 0 1.78
649 71142 16 4.87 72 1 1 1.83
There are 9 outliers. Check the information of these 9 outliers, we would see that all these outliers
are males, most (8/9) are non-smokers, and they are rather tall.
(d) Separate histograms for male and female FEV are given in Figure 3.
> female = FEV[which(Sex==0)] #values of FEV of females
> male = FEV[which(Sex==1)] #values of FEV of males
>
> # opar <- par(mfrow=c(1,2)) #arrange a figure which has 1 row, 2 columns
>
> # hist(female, col = 2, freq= FALSE, main = "Histogram of Female FEV", ylim = c(0,0.52))
>
> # hist(male, col = 4, freq= FALSE, main = "Histogram of Male FEV", ylim = c(0,0.52))
>
> # par(opar)
>
The summary could be:
The shapes of the two histograms are quite different. Both are unimodal, but for females, it is
almost symmetrical but for males it is quite right-skewed.
2
ST1131
The median FEV for females is much lower than that for males (2.49 compared to 2.605).
In addition, the variabilty in the male group is higher than the variability in the female group.
The respective IQRs are 1.54 and 1.05.
(e) The scatter plot of FEV vs height (m) where it classifes the data points by gender (more advanced)
is given in Figure 4.
> # plot(height, FEV)
> # to answer the question, this is enough. It gives a simple scatter plot
> cor(FEV, height)
[1] 0.8675619
The computed correlation is quite high, and it is clear from the plot that there is a strong positive
linear association between the two variables overall. The range of FEV for males appears larger
than the range for females, as does the range of heights. The variability of FEV at lower heights
does seem to be slightly less than the variability of FEV at greater heights.
3
ST1131
2. Question of interest: Do male college students follow their school’s teams more closely than female?
The following data was collected in class on Monday morning, after a particularly exciting and impor-
tant basketball game. The question asked was: Did you watch the game on TV last night?
(a) What is the response variable and explanatory variable in this study?
(b) For each gender, find the percentage of watching game (all, part and none).
(c) State the pairs of percentages for comparison. Plot a bar plot which helps to compare the per-
centages found.
(d) Is it fair to say that males were more likely to watch the game than females?
(e) What population that your conclusion above can apply to?
Solution:
> male = c(10, 12, 4)
> female = c(21, 24, 30)
> #create a matrix with 2 rows for males and females:
> x = matrix(c(male, female),nrow = 2, byrow = T); x
(a) Response variable: Game watching (categorical) with three categories: whole, part and none.
Explanatory variable: gender (categorical) with two categories: male, female.
4
ST1131
3. (Question for Group Work) The researchers investigated if the life expectancy of animals depend on
the length of their gestational period. They considered a dataset, Animals.csv, which has observations
on the average longevity (in years) and average gestational period (in days) for 21 animals.
5
ST1131
Solution:
6
ST1131