You are on page 1of 6

2 Labs for MH3511: Week 2

Exercise 2.1.

1. Load the file Lab02car.csv and inspect the content.


2. List all rows with Weight > 3000.
3. List all rows with Weight > 3000 and Horsepower > 200.
4. Extract the origin of cars with weight > 3000 and Horsepower > 200.

[1]: cars <- read.csv('DataLabs/Lab02car.csv')

[2]: str(cars)

'data.frame': 12 obs. of 7 variables:


$ Car : Factor w/ 12 levels "Audi 100LS","Chevrolet Chevette",..: 7 1
11 12 10 9 6 4 5 3 ...
$ MPG : num 30 20 29 20 25 26 36.1 22 10 10 ...
$ Displacement: int 79 114 90 130 113 108 91 146 360 307 ...
$ Horsepower : int 70 91 70 102 95 93 60 97 215 200 ...
$ Weight : int 2074 2582 1937 3150 2228 2391 1800 2815 4615 4376 ...
$ Model : int 71 73 75 76 71 74 78 77 70 70 ...
$ Origin : Factor w/ 3 levels "Europe","Japan",..: 1 1 1 1 2 2 2 2 3 3 ...

[2]: # option 1
subset(cars,Weight>3000)

Car MPG Displacement Horsepower Weight Model Origin


4 Volvo 245 20 130 102 3150 76 Europe
9 Ford F250 10 360 215 4615 70 US
10 Chevy C20 10 307 200 4376 70 US
11 Pontiac Catalina 16 400 170 4668 75 US

[3]: # option 2
cars[cars$Weight>3000,]

Car MPG Displacement Horsepower Weight Model Origin


4 Volvo 245 20 130 102 3150 76 Europe
9 Ford F250 10 360 215 4615 70 US
10 Chevy C20 10 307 200 4376 70 US
11 Pontiac Catalina 16 400 170 4668 75 US

[4]: # option 1
subset(cars,(Weight>3000)&(Horsepower>200))

Car MPG Displacement Horsepower Weight Model Origin


9 Ford F250 10 360 215 4615 70 US

1
[5]: # option 2
cars[(cars$Weight>3000)&(cars$Horsepower>200),]

Car MPG Displacement Horsepower Weight Model Origin


9 Ford F250 10 360 215 4615 70 US

[6]: # option 1
subset(cars,(Weight>3000)&(Horsepower>200))['Origin'];
class(subset(cars,(Weight>3000)&(Horsepower>200))['Origin'])

Origin
9 US
’data.frame’
[7]: # option 2
cars[(cars$Weight>3000)&(cars$Horsepower>200),'Origin'];
class(cars[(cars$Weight>3000)&(cars$Horsepower>200),'Origin'])

US Levels: 1. ’Europe’ 2. ’Japan’ 3. ’US’


’factor’
Exercise 2.2.

There are many types of files that can be imported in R. We first present a new method. The
file Lab02fixed.txt contains five variables, but there is no delimiter.
• col 1-3 contain the identity of the subject (id)
• col 4 contains the gender of the subject (gender)
• col 5-7 contain the height of the subject (height)
• col 8-9 contain the weight of the subject (weight)
• col 10 contains the number of siblings (siblings)
It is possible to import this data using read.fwf(), by specifying with the width parameter
the number of columns used for each parameter.
1. Load the file Lab02fixed.txt and inspect the content.
2. Assign names to columns.
3. Load the file Lab02test.txt (open the file, and based on its format, choose a suitable
function to load the file).
4. Merge the two dataframes (use merge()) and inspect the result.
5. Suppose the weight of subject 211 was recorded wrongly as 99, instead of the correct
value 77. Modify the merged dataframe to rectify this mistake.
6. Identify all individuals whose heights are greater than 182, and list down their test
scores.
7. Find the tallest female in the dataset and obtain her height, weight and test score.

[8]: lab2 <- read.fwf('DataLabs/Lab02fixed.txt', width=c(3,1,3,2,1))

[9]: head(lab2)

2
V1 V2 V3 V4 V5
201 M 173 65 1
202 F 158 47 1
203 F 158 43 2
204 F 162 45 2
205 M 169 58 1
206 M 171 63 2

[10]: colnames(lab2) <- c('id', 'gender', 'height', 'weight', 'siblings')

[11]: head(lab2)

id gender height weight siblings


201 M 173 65 1
202 F 158 47 1
203 F 158 43 2
204 F 162 45 2
205 M 169 58 1
206 M 171 63 2

[12]: # option 1
lab2test <- read.csv('DataLabs/Lab02test.txt', sep=' ')

[13]: head(lab2test)

id test
201 67
202 64
203 67
204 81
205 86
206 69

[14]: # option 2: note the need of header=TRUE


lab2test <- read.table('DataLabs/Lab02test.txt', sep=' ', header=TRUE)

[15]: head(lab2test)

id test
201 67
202 64
203 67
204 81
205 86
206 69

[16]: # merge
lab2merged <- merge.data.frame(lab2, lab2test, by.x="id", by.y="id")

3
[17]: head(lab2merged)

id gender height weight siblings test


201 M 173 65 1 67
202 F 158 47 1 64
203 F 158 43 2 67
204 F 162 45 2 81
205 M 169 58 1 86
206 M 171 63 2 69

[18]: # check weight


lab2merged[lab2merged$id==211, "weight"]

99
[19]: # correct and check
lab2merged[lab2merged$id==211, "weight"] <- 77; lab2merged[lab2merged$id==211,␣
,→"weight"]

77
[20]: lab2merged[lab2merged$height>182,]

id gender height weight siblings test


67 267 M 183 85 0 55
71 271 M 188 60 2 76
85 285 M 184 67 3 54

[21]: max(lab2merged[lab2merged$gender =='F','height'])

178
[22]: lab2merged[(lab2merged$height==178)&(lab2merged$gender =='F'),]

id gender height weight siblings test


99 299 F 178 55 1 47
Exercise 2.3.

Write a script in R which, given a positive integer less than 1000, print the number of digits
of this integer.

[23]: n <- 15
if (n>0 & n<10){print(paste(n, "has 1 digit"))
} else{ if (n<100){print(paste(n, "has 2 digits"))
} else print(paste(n, "has 3 digit"))
}

[1] "15 has 2 digits"

4
[24]: n <- 88
if (n>0 & n<10){print(paste(n, "has 1 digit"))
} else{ if (n<100){print(paste(n, "has 2 digits"))
} else print(paste(n, "has 3 digit"))
}

[1] "88 has 2 digits"

[25]: n <- 284


if (n>0 & n<10){print(paste(n, "has 1 digit"))
} else{ if (n<100){print(paste(n, "has 2 digits"))
} else print(paste(n, "has 3 digit"))
}

[1] "284 has 3 digit"


Exercise 2.4.

Use strsplit() to extract a list of characters out of the sentence ‘There are three types of lies –
lies, damn lies, and statistics’. Then count the number of ’e’ present.

[26]: statquote <- "There are three types of lies -- lies, damn lies, and statistics"
chars <- strsplit(statquote, split = "")
chars

1. ’T’ ’h’ ’e’ ’r’ ’e’ ’ ’ ’a’ ’r’ ’e’ ’ ’ ’t’ ’h’ ’r’ ’e’ ’e’ ’ ’ ’t’ ’y’ ’p’ ’e’ ’s’ ’ ’ ’o’ ’f’ ’ ’ ’l’ ’i’
’e’ ’s’ ’ ’ ’-’ ’-’ ’ ’ ’l’ ’i’ ’e’ ’s’ ’,’ ’ ’ ’d’ ’a’ ’m’ ’n’ ’ ’ ’l’ ’i’ ’e’ ’s’ ’,’ ’ ’ ’a’ ’n’ ’d’ ’ ’ ’s’
’t’ ’a’ ’t’ ’i’ ’s’ ’t’ ’i’ ’c’ ’s’
[27]: class(chars); length(chars)

’list’
1
[28]: class(chars[1]);length(chars[1])

’list’
1
[29]: chars[[1]]

’T’ ’h’ ’e’ ’r’ ’e’ ’ ’ ’a’ ’r’ ’e’ ’ ’ ’t’ ’h’ ’r’ ’e’ ’e’ ’ ’ ’t’ ’y’ ’p’ ’e’ ’s’ ’ ’ ’o’ ’f’ ’ ’ ’l’ ’i’ ’e’ ’s’
’ ’ ’-’ ’-’ ’ ’ ’l’ ’i’ ’e’ ’s’ ’,’ ’ ’ ’d’ ’a’ ’m’ ’n’ ’ ’ ’l’ ’i’ ’e’ ’s’ ’,’ ’ ’ ’a’ ’n’ ’d’ ’ ’ ’s’ ’t’ ’a’ ’t’ ’i’
’s’ ’t’ ’i’ ’c’ ’s’
[30]: cnt_e <- 0
for (c in chars[[1]]){
if (c=='e'){

5
cnt_e <- cnt_e+1
}
}
cnt_e

9
Exercise 2.5.

Write a function which, given as input two positive integers a, b, outputs the quotient q and
the remainder r of the Euclidean division (a = bq + r).

[31]: ED <- function(a,b){


r <- a%%b
q <- (a-r)/b
return(list(q,r))
}

[32]: ED(27,2)

1. 13
2. 1
Exercise 2.6.

For a vector x = ( x1 , . . . , xn ), its fourth central moment is by definition

1 n
n i∑
m4 = ( x i − x )4 .
=1

Write a function to compute the fourth central moment of a vector.

[33]: fourthcm <- function(v){


m4 <- sum((v-mean(v))^4)/length(v)
return(m4)
}

[34]: v <- c(2,3,4,1,7,10)


fourthcm(v)

191.395833333333

You might also like