Professional Documents
Culture Documents
Data Exploration With R III PDF
Data Exploration With R III PDF
with R III
Noratiqah Mohd Ariff
Contents
• Head and Tail
• Which
• Subset
• Sort and Order
• Merge
• Aggregate
• Factor
• Table
• Apply
• Missing Values
• Paste and Split
Head and Tail
✖Obtain the first several rows of a matrix or
data frame using head, and use tail to
obtain the last several rows.
✖Example:
head(trees,3)
Girth Height Volume
1 8.3 70 10.3
2 8.6 65 10.3
3 8.8 63 10.2
tail(trees,2)
Girth Height Volume
30 18.0 80 51
31 20.6 87 77
Which
✖Return the position of the in a logical vector
which are TRUE.
✖Example:
which(mtcars$cyl==6)
[1] 1 2 4 6 10 11 30
mtcars[which(mtcars$cyl==4&mtcars$gear==5),]
mpg cyl disp hp drat wt qsec vs am gear carb
Porsche 914-2 26.0 4 120.3 91 4.43 2.140 16.7 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 1.513 16.9 1 1 5 2
Subset
✖The easiest way to select variables and
observations.
✖Example:
subset(mtcars,subset=hp<100,select=mpg)
mpg
Datsun 710 22.8
Merc 240D 24.4
Merc 230 22.8
Fiat 128 32.4
Honda Civic 30.4
Toyota Corolla 33.9
Toyota Corona 21.5
Fiat X1-9 27.3
Porsche 914-2 26.0
Sort & Order
✖ The sort() function sorts a vector or factor
into ascending or descending order.
✖ Example:
sort(mtcars$mpg,decreasing=T)
dataC<-merge(dataA,dataB,by="Name")
dataC
Name Age Weight Height Address
1 Abu 23 78 80 Urban
2 Ahmad 24 80 87 Rural
3 Ali 25 75 80 Urban
4 Amir 22 74 79 Rural
Aggregate
✖ Splits the data into subsets, computes
summary statistics for each, and returns the
result in a convenient form.
✖ The by variables must be in a list (even if
there is only one).
✖ Example:
agg <-aggregate(mtcars, y=list(mtcars$cyl),
FUN=median)
agg
Group.1 mpg cyl disp hp drat wt qsec vs am gear carb
1 4 26.0 4 108.0 91.0 4.080 2.200 18.900 1 1 4 2.0
2 6 19.7 6 167.6 110.0 3.900 3.215 18.300 1 0 4 4.0
3 8 15.2 8 350.5 192.5 3.115 3.755 17.175 0 0 3 3.5
Factor
✖The output levels will tell us how many and
what factors are in the data set.
✖Example:
num<-c(1,2,2,3,1,2,3,3,1,2,3,3)
fnum<-factor(num)
> fnum
[1] 1 2 2 3 1 2 3 3 1 2 3 3
Levels: 1 2 3
✖We can also relabel the factors
fnum<-factor(num,labels=c("I","II","III"))
Table
✖Tabulate data and provide the frequency of each
factor.
✖Example:
mons <-
c("March","April","January","November","January","September","October",
"September", "November","August","January","November", "November",
"February","May","August","July","December","August","August",
"September","November","February","April")
mons <- factor(mons)
table(mons)
✖We can also reorder the table
mons <-
factor(mons,levels=c("January","February","March","April","May","June",
"July",“August","September", "October", "November", "December"),
ordered=TRUE)
Table
✖ Tabulate continuous data by first creating
intervals using the cut() function.
✖ Convert a numeric variable into a factor.
✖ The argument breaks= describe how ranges
of numbers will be converted to factor values.
✖ If breaks=(a number), the resulting factor
will be created by dividing the range of the
variable into that number of equal-length
intervals.
✖ If breaks=(a vector of values), the values in
the vector are used to determine the
breakpoints. The number of levels will be one
less than the number of values in the vector.
Table
✖ Example:
attach(women)
wfact <- cut(weight,breaks=3)
wtab <- table(wfact)