Professional Documents
Culture Documents
a table:
> Ls() # shipunov
Name Mode Type Obs Vars Size
1 aa numeric vector 5 1 88 bytes
2 bb numeric matrix 3 3 248 bytes
3 cards character vector 36 1 2 Kb
4 coordinates list data.frame 10000 2 92.2 Kb
5 dice character vector 36 1 2 Kb
...
(To use Ls(), install shipunov package first, see the preface for explanation.)
Ls() is also handy when you start to work with large objects: it helps to clean R
memory3 .
76
[1] FALSE
> is.logical(likes.pizza)
[1] TRUE
> likes.pizza * 1
[1] 1 1 0 0 1 1 0
> as.logical(c(1, 1, 0))
[1] TRUE TRUE FALSE
> as.numeric(likes.pizza)
[1] 1 1 0 0 1 1 0
This is the most useful feature of binary data. All other types of data, from measure-
ment to nominal (the last is most useful), could be converted into logical, and logical
is easy to convert into 0/1 numbers:
> Tobin(sex, convert.names=FALSE)
female male
[1,] 0 1
[2,] 1 0
[3,] 0 1
[4,] 0 1
[5,] 1 0
[6,] 0 1
[7,] 0 1
Afterwards, many specialized methods, such as logistic regression or binary similar-
ity metrics, will become available even to that initially nominal data.
As an example, this is how to convert the character sex vector into logical:
> (is.male <- sex == "male")
[1] TRUE FALSE TRUE TRUE FALSE TRUE TRUE
> (is.female <- sex == "female")
[1] FALSE TRUE FALSE FALSE TRUE FALSE FALSE
(We applied logical expression on the right side of assignment using “is equal?” dou-
ble equation symbol operator. This is the second numerical way to work with nomi-
nal data. Note that one character vector with two types of values became two logical
vectors.)
Logical vectors are useful also for indexing:
> x > 170
[1] TRUE FALSE TRUE TRUE FALSE FALSE TRUE
> x[x > 170]
[1] 174.0 188.0 192.0 172.5
77
(First, we applied logical expression with greater sign to create the logical vector.
Second, we used square brackets to index heights vector; in other words, we selected
those heights which are greater than 170 cm.)
Apart from greater and equal signs, there are many other logical operators which
allow to create logical expressions in R (see Table 3.1):
== EQUAL
<= EQUAL OR LESS
>= EQUAL OR MORE
& AND
| OR
! NOT
!= NOT EQUAL
%in% MATCH
AND and OR operators (& and |) help to build truly advanced and highly useful logical
expressions:
> ((x < 180) | (w <= 70)) & (sex=="female" | m=="S")
[1] FALSE TRUE FALSE FALSE TRUE FALSE FALSE
(Here we selected only those people which height is less than 170 cm or weight is
70 kg or less, these people must also be either females or bear small size T-shirts.
Note that use of parentheses allows to control the order of calculations and also
makes expression more understandable.)
***
Logical expressions are even more powerful if you learn how to use them together
with command ifelse() and operator if (the last is frequently supplied with else):
> ifelse(sex=="female", "pink", "blue")
[1] "blue" "pink" "blue" "blue" "pink" "blue" "blue"
> if(sex[1]=="female") {
+ "pink"
78
+ } else {
+ "blue"
+ }
[1] "blue"
(Command ifelse() is vectorized so it goes through multiple conditions at once.
Operator if takes only one condition.)
Note the use of curly braces it the last rows. Curly braces turn a number of expressions
into a single (combined) expression. When there is only a single command, the curly
braces are optional. Curly braces may contain two commands on one row if they are
separated with semicolon.
79
More popular are pie-charts and barplots. However, they represent data badly. There
were multiple experiments when people were asked to look on different kinds of
plots, and then to report numbers they actually remember. You can run this experi-
ment yourself. Figure 3.7 is a barplot of top twelve R commands:
> load("data/com12.rd")
> exists("com12") # check if our object is here
[1] TRUE
> com12.plot <- barplot(com12, names.arg="")
> text(com12.plot, par("usr")[3]*2, srt=45, pos=2,
+ xpd=TRUE, labels=names(com12))
(We load()’ed binary file to avoid using commands which we did not yet learn; to
load binary file from Internet, use load(url(...)). To make bar labels look bet-
ter, we applied here the “trick” with rotation. Much more simple but less aesthetic
solution is barplot(com12, las=2).)
Try looking at this barplot for 3–5 minutes, then withdraw from this book and
report numbers seen there, from largest to smallest. Compare with the answer
from the end of the chapter.
In many experiments like this, researchers found that the most accurately under-
stood graphical feature is the position along the axis, whereas length, angle, area,
density and color are each less and less appropriate. This is why from the beginning
of R history, pie-charts and barplots were recommended to replace with dotcharts
(Fig. 3.8):
> dotchart(com12)
We hope you would agree that the dotchart is easier both to understand and to re-
member. (Of course, it is possible to make this plot even more understandable with
sorting like dotchart(rev(sort(com12)))—try it yourself. It is also possible to sort
bars, but even sorted barplot is worse then dotchart.)
Another useful plot for counts is the word cloud, the image where every item is mag-
nified in accordance with its frequency. This idea came out of text mining tools. To
make word clouds in R, one might use the wordcloud package (Fig. 3.9):
> com80 <- read.table("data/com80.txt")
> library(wordcloud)
> set.seed(5) # freeze random number generator
> wordcloud(words=com80[, 1], freq=com80[, 2],
+ colors=brewer.pal(8, "Dark2"))
80