You are on page 1of 37

Introduction to Computing for Economics and

Management
Lecture 6: Loops

30th March 2016 Assistant Professor Dr. Bert ARNRICH

Previous lectures

 Vectors
 Matrices
 Lists
 Data Frames

30th March 2016 Assistant Professor Dr. Bert ARNRICH 2

1
Vectors

 Integer mode
> person.weight <- c(65, 66, 61)

 Numeric (floating-point number)


> person.height <- c(1.70, 1.75, 1.62)

 Character (string)
> person.name <- c("Can", "Cem", "Hande")

 Logical (Boolean)
> person.female <- c(FALSE, FALSE, TRUE)

 Complex
> complex.numbers <- c(1+2i, -1+0i)
30th March 2016 Assistant Professor Dr. Bert ARNRICH 3

Vectors

 Assign names to the elements of a data vector


> person.height <- c(Can=1.70, Cem=1.75,
Hande=1.62)

 Indexing
> person.height[c(T,F,T)]
Can Hande
1.70 1.62

> person.height[c(1,3)]
Can Hande
1.70 1.62

> person.height[-1]
Cem Hande
1.75 1.62

30th March 2016 Assistant Professor Dr. Bert ARNRICH 4

2
Vectors

 Filtering
> person.height[person.height > 1.65]
Can Cem
1.72 1.75

 Recycling
> c(1, 2, 3) + c(1, 2, 3, 4)
[1] 2 4 6 5

 Vector operations
> person.weight / person.height^2
Can Cem Hande
21.97134 21.55102 23.24341

30th March 2016 Assistant Professor Dr. Bert ARNRICH 5

Matrices

 Creation
> y <- matrix(c(1,2,3,4),nrow=2,ncol=2)

> cbind(c(1,2), c(3,4))

 Matrix operations
 Transposition t(y)
 Element by element product y * y
 Matrix multiplication y %*% y
 Matrix scalar multiplication 3 * y
 Matrix addition y + y

 Indexing, e.g. select first and second row


> z[c(1,2),]

30th March 2016 Assistant Professor Dr. Bert ARNRICH 6

3
Matrices

 Assign new values to submatrices


> z[c(1:2), c(2:3)] <- matrix(c(20,21,22,23), nrow=2)

 Filtering, e.g. obtain those rows of matrix z having elements


in the second column which are at least equal to 5
> z[z[,2] >= 5,]

30th March 2016 Assistant Professor Dr. Bert ARNRICH 7

Lists

 Creation
> joe <- list(name="Joe", salary=55000,
staff=T)

 Indexing
> joe$salary
> joe[["salary"]]
> joe[[2]]

 Vectors as list components


> my.list <- list(vec1 = c(1,2), vec2 =
c(3,4), vec3 = 5:7)

30th March 2016 Assistant Professor Dr. Bert ARNRICH 8

4
Data frames

 Creation
> person <- data.frame(height=person.height,
weight=person.weight)

 Indexing
> person[[1]]
> person[["height"]]
> person$height
> person[c(1,2),]
> person[-3,]

 Filtering
> person[person$height >= 1.7,]

30th March 2016 Assistant Professor Dr. Bert ARNRICH 9

Data frames

 Data import
> person.data <- read.table(header=TRUE,
"height_weight_data.txt", sep=",")

 Data modifications
> person.data$BMI <- person.data$Weight /
person.data$Height^2

 Summary
> summary(person.data)

 Merging
> merge(person.data, person.data2)

30th March 2016 Assistant Professor Dr. Bert ARNRICH 10

5
Program today

 In a previous lecture, the word list example showed us that


iterating through a set of elements is an important operation

 Today we will learn how we can program such iterations with


so called “for loops”

 We will start with a recap of the word list example and


continue with programming for loops

30th March 2016 Assistant Professor Dr. Bert ARNRICH 11

Recap: word list

 Web search and other types of textual data mining are of


great interest

 Let’s assume we have a collection of text documents

 Whenever we search for some term, we would like to


retrieve those documents in which our search term appears
most often

 Our first goal is to determine which words are in a text and at


which location in the text each word occurs

30th March 2016 Assistant Professor Dr. Bert ARNRICH 12

6
Recap: word list

 Let’s consider this sentence as our text example:


a text consists of a word and another word
and so on and so forth

 For each word we need to obtain the location in the text:


 a 1 5
 text 2
 consists 3
 of 4
 word 6 9
 and 7 10 13
 another 8
 so 11 14
 on 12
 forth 15

30th March 2016 Assistant Professor Dr. Bert ARNRICH 13

Recap: word list

 Let’s assume that we iterate through our text in a word by


word manner: a, text, consists, of, a, ...

 Let’s further assume that the current word in our iteration is


always stored in the variable word

 Let’s further assume that we have a counter i which is


increased by 1 for every word: the counter tells the current
position in the text

30th March 2016 Assistant Professor Dr. Bert ARNRICH 14

7
Recap: word list

 Let’s start with initializing our word list


> word.list <- list()

 Our first word a is stored in the variable word


word <- "a"

 Since it is our first word, our counter i has the value 1


> i <- 1

 Now we add our current word a to our word list


> word.list[[word]] <- c(word.list[[word]], i)

30th March 2016 Assistant Professor Dr. Bert ARNRICH 15

Example: word list

In the last step, when adding the current word to our word list
we used the two list characteristics
> word.list[[word]] <- c(word.list[[word]], i)

1. We can access list components with named tags


> joe[["salary"]]
[1] 55000

2. We can have vectors as list components

> my.list <- list(vec = c(1,2))

> my.list
$vec
[1] 1 2
30th March 2016 Assistant Professor Dr. Bert ARNRICH 16

8
Example: word list

 Let’s check our word list after the first iteration


> word.list
$a
[1] 1

 The value for list component “a” is 1 since it is the first time that
“a” was added to the list

 We interpret this intermediate result as word a has position 1

30th March 2016 Assistant Professor Dr. Bert ARNRICH 17

Example: word list


 We go on with the second word text
> word <- "text"
> i <- 2
> word.list[[word]] <- c(word.list[[word]], i)

 Let’s check again our word list after the second iteration
> word.list
$a
[1] 1

$text
[1] 2

 We observe that we now have a second list entry text with position
value 2

30th March 2016 Assistant Professor Dr. Bert ARNRICH 18

9
Example: word list

 We proceed with the next 3 iterations


> word <- "consists"
> i <- 3
> word.list[[word]] <- c(word.list[[word]], i)

> word <- "of"


> i <- 4
> word.list[[word]] <- c(word.list[[word]], i)

> word <- "a"


> i <- 5
> word.list[[word]] <- c(word.list[[word]], i)

30th March 2016 Assistant Professor Dr. Bert ARNRICH 19

Example: word list

 We check our word list again


> word.list
$a
[1] 1 5

$text
[1] 2

$consists
[1] 3

$of
[1] 4

30th March 2016 Assistant Professor Dr. Bert ARNRICH 20

10
Example: word list

 So far we have learned how to represent which words are in


a text and at which location in the text each word occurs

 We have iterated through our text in a word by word manner

 In practice, such iterations are automated with so called


loops

30th March 2016 Assistant Professor Dr. Bert ARNRICH 21

Looping

 The most frequently used looping construct is


for(x in vec) {expression}

 The for loop iterates through all elements of the vector vec

 For each element of the vector vec there will be one


iteration of the loop and expression is executed

 At each iteration, the variable x takes the value of the current


element of vec
 First iteration: x = vec[1]
 Second iteration: x = vec[2]
 …

30th March 2016 Assistant Professor Dr. Bert ARNRICH 22

11
Print variable when iterating

 Let’s print out the value of variable x when iterating through


the vector vec

> vec <- c(1:5)

> vec
[1] 1 2 3 4 5

> for(x in vec) {print (x)}


[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

30th March 2016 Assistant Professor Dr. Bert ARNRICH 23

Print variable when iterating

 The for loop works with other modes beside numeric as well

 Example: like before we print the value of the variable when


iterating through a vector of strings

> word.vector <- c("a", "text", "consists",


"of")

> for(word in word.vector) {print (word)}


[1] "a"
[1] "text"
[1] "consists"
[1] "of"

30th March 2016 Assistant Professor Dr. Bert ARNRICH 24

12
Print variable when iterating

 As an alternative we can create a new vector which ranges


from 1 until the length of the vector, iterate through this vector
and access the original vector via indexing

> vector.indices <- 1:length(word.vector)

> vector.indices
[1] 1 2 3 4

> for(i in vector.indices) {print(word.vector[i])}


[1] "a"
[1] "text"
[1] "consists"
[1] "of"

30th March 2016 Assistant Professor Dr. Bert ARNRICH 25

Print variable when iterating

 We can write the alternative way in one line

> for(i in 1:length(word.vector)) {print


(word.vector[i])}
[1] "a"
[1] "text"
[1] "consists"
[1] "of"

30th March 2016 Assistant Professor Dr. Bert ARNRICH 26

13
Print variable when iterating

 The advantage of the alternative way is that we have access


to the iteration number and the vector elements

 Example: print index i together with the vector element by


using the paste() function for concatenating strings

> for(i in 1:length(word.vector)) {print


(paste("Element", i, "is", word.vector[i]))}
[1] "Element 1 is a"
[1] "Element 2 is text"
[1] "Element 3 is consists"
[1] "Element 4 is of"

30th March 2016 Assistant Professor Dr. Bert ARNRICH 27

Compute length of a vector

 We can change the value of a variable during looping

 For example, let’s write our own function for computing the
length of a vector

> vec <- c(1:10)

> counter <- 0

> for(x in vec) {counter <- counter + 1}

> counter
[1] 10

30th March 2016 Assistant Professor Dr. Bert ARNRICH 28

14
Recap: writing our own function

 A simple function that adds 1 to its input and returns the


result

> AddOne <- function(x) {x+1}

Function Function Instructions that take the


name inputs inputs and use them to
compute other values.

The last computed value


is returned by default.

30th March 2016 Assistant Professor Dr. Bert ARNRICH 29

Recap: writing our own function

 After defining our function, we can work with it

> AddOne <- function(x) {x+1}

> AddOne(1)
[1] 2

> AddOne(-5)
[1] -4

> AddOne(c(1,2,3))
[1] 2 3 4

30th March 2016 Assistant Professor Dr. Bert ARNRICH 30

15
Recap: writing our own function

 A more sophisticated function that adds a user-specified value


to its first input

> AddValue <- function(x, Addend=1) {x+Addend}

In addition to the first


input x we specify a
second input Addend
with default value 1.

30th March 2016 Assistant Professor Dr. Bert ARNRICH 31

Recap: writing our own function

 After defining our new function, we can work with it

> AddValue <- function(x, Addend=1) {x+Addend}

> AddValue(1)
[1] 2

> AddValue(1,2)
[1] 3

> AddValue(c(1:3),2)
[1] 3 4 5

30th March 2016 Assistant Professor Dr. Bert ARNRICH 32

16
Compute length of a vector

 Write our own function for computing the length of a vector

## function to compute length of vector vec


vec.length <- function(vec)
{
# initialize counter
counter <- 0

# iterate through vec and increase counter


for(x in vec) {counter <- counter + 1}

# return counter
return(counter)
}

30th March 2016 Assistant Professor Dr. Bert ARNRICH 33

Compute length of a vector

 In the previous function definition we’ve used # to comment


our code

 It is a good practice to comment code in particular when


sharing code

 We test our function


> vec.length(c(1:10))
[1] 10

> vec.length(c("Hande", "Cem", "Can"))


[1] 3

30th March 2016 Assistant Professor Dr. Bert ARNRICH 34

17
Compute norm of a vector

 For a 3-dimensional vector,


the Euclidean norm is defined
as

     

 We will write two functions

 (1) Compute the Euclidean norm


of a n-dimensional vector

    ⋯  


 (2) Compute p-norm of a n-


dimensional vector

    ⋯  

30th March 2016 Assistant Professor Dr. Bert ARNRICH 35

Compute Euclidean norm of a vector


## compute Euclidean norm of a vector vec
Euclid.norm <- function(vec)
{
# initialize norm
norm <- 0

# compute sum of squared vector elements


for(x in vec) {norm <- norm + x^2}

# sqrt of sum
norm <- sqrt(norm)

return(norm)
}

30th March 2016 Assistant Professor Dr. Bert ARNRICH 36

18
Compute Euclidean norm of a vector

 We test our function

> Euclid.norm(c(1, 2, 3))


[1] 3.741657

> sqrt(1^2+2^2+3^2)
[1] 3.741657

> Euclid.norm(c(sqrt(1), sqrt(3)))


[1] 2

30th March 2016 Assistant Professor Dr. Bert ARNRICH 37

p-norm of a vector
## compute p-norm of a vector vec
p.norm <- function(vec, p=2)
{
# initialize norm
norm <- 0

# compute sum of exponentiated vector elements


for(x in vec) {norm <- norm + x^p}

# p radical of sum
norm <- (norm)^(1/p)

return(norm)
}

30th March 2016 Assistant Professor Dr. Bert ARNRICH 38

19
p-norm of a vector

 We test our function

> p.norm(c(1, 2, 3), p=1)


[1] 6

> p.norm(c(1, 2, 3))


[1] 3.741657

> p.norm(c(1, 2, 3), p=3)


[1] 3.301927

30th March 2016 Assistant Professor Dr. Bert ARNRICH 39

Square elements of a vector

 We can change the elements of the input vector and return a


new vector, e.g. square the elements of a vector

## square elements of vector vec


square.vec <- function(vec)
{
# initialize output vector vec.res
vec.res <- vector()

# fill vec.res with squared elements of vec


for(x in vec) {vec.res <- c(vec.res, x^2)}

return(vec.res)
}

30th March 2016 Assistant Professor Dr. Bert ARNRICH 40

20
Square elements of a vector

 We test our function

> square.vec(c(1,2,3))
[1] 1 4 9

> square.vec(7:10)
[1] 49 64 81 100

> (7:10)^2
[1] 49 64 81 100

30th March 2016 Assistant Professor Dr. Bert ARNRICH 41

Function findwords

 Now we are ready to program our word list function


## finds locations of each word in word.vec
findwords <- function(word.vec)
{
# initialize word list
word.list <- list()

# iterate through word vector


for(i in 1:length(word.vec))
{
# store current word in variable word
word <- word.vec[i]
# add current word to word.list
word.list[[word]] <- c(word.list[[word]], i)
}
return(word.list)
}

30th March 2016 Assistant Professor Dr. Bert ARNRICH 42

21
Function findwords

 We test our function

> findwords(c("a", "text", "consists", "of", "a"))


$a
[1] 1 5

$text
[1] 2

$consists
[1] 3

$of
[1] 4

30th March 2016 Assistant Professor Dr. Bert ARNRICH 43

Read data from file

 The only drawback for now is that we still have to provide a


vector of words

 It would be much more convenient if the function would read


the text from a file

 We can use the scan() function which read data from a file
into a vector

30th March 2016 Assistant Professor Dr. Bert ARNRICH 44

22
Read data from file

 As a first step we create a new text file using RStudio or an


alternative text editor like Notepad++

30th March 2016 Assistant Professor Dr. Bert ARNRICH 45

Read data from file

 Next, we write our text into the new file

30th March 2016 Assistant Professor Dr. Bert ARNRICH 46

23
Read data from file

 Finally we save the file

30th March 2016 Assistant Professor Dr. Bert ARNRICH 47

Working directory

 Before the actual import, we need to check the current


working directory to make sure which path to use when
importing the data file

> getwd()
[1] "C:/Users/Bert/Documents"

 We change working directory to the path where our data file


is located in order to simplify data import

30th March 2016 Assistant Professor Dr. Bert ARNRICH 48

24
Change working directory in R

30th March 2016 Assistant Professor Dr. Bert ARNRICH 49

Change working directory in RStudio

 In the files tab we select the “…” item and browse to the
folder in which we have stored the text file
30th March 2016 Assistant Professor Dr. Bert ARNRICH 50

25
Change working directory in RStudio

 In the menu “More”, we select “Set As Working Directory”

30th March 2016 Assistant Professor Dr. Bert ARNRICH 51

Read data from file

 We read text data from a file into a vector


> word.vec <- scan("text.txt", "")
Read 15 items

> word.vec
[1] "a" "text" "consists" "of"
[5] "a" "word" "and" "another"
[9] "word" "and" "so" "on"
[13] "and" "so" "forth"

 The second argument is a short form of what="" to indicate


that we intend to import text data

 Similar like in read.table() the separator between items is


‘white space’ by default: one or more spaces, tabs, newlines or
carriage returns
30th March 2016 Assistant Professor Dr. Bert ARNRICH 52

26
Function findwords

 Now, we can use the imported word vector


> findwords(word.vec)
$a
[1] 1 5

$text
[1] 2

$consists
[1] 3

$of
[1] 4

$word
[1] 6 9

...
30th March 2016 Assistant Professor Dr. Bert ARNRICH 53

Function findwords

 As a next step, we can enhance our function findwords by


adding the text file import functionality

 In this way we don’t need to import the text into a vector


beforehand

 Instead of a word vector, the enhanced function needs a file


name as input

30th March 2016 Assistant Professor Dr. Bert ARNRICH 54

27
Function findwords
## finds locations of each word in file
findwords <- function(file)
{
# fill word.vec from data in file
word.vec <- scan(file, "")

# initialize word list


word.list <- list()

# iterate through word vector


for(i in 1:length(word.vec))
{
# store current word in variable word
word <- word.vec[i]
# add current word to word.list
word.list[[word]] <- c(word.list[[word]], i)
}
return(word.list)
}
30th March 2016 Assistant Professor Dr. Bert ARNRICH 55

Function findwords

 We test our enhanced function


> findwords("text.txt")
$a
[1] 1 5

$text
[1] 2

$consists
[1] 3

$of
[1] 4

$word
[1] 6 9

...
30th March 2016 Assistant Professor Dr. Bert ARNRICH 56

28
Sort word list by word frequency

 Now we are ready to obtain the word frequencies from any


text

 The output of our function can be improved

 So far, we report words in the order how the words appeared


in the text

 It would be more convenient to sort the result alphabetically


or by word frequency

30th March 2016 Assistant Professor Dr. Bert ARNRICH 57

Sort word list by word frequency

 In a previous lecture we have already learned how to sort


our word list by word frequency

 First, we determine the word frequency


> word.freq <- sapply(word.list, length)

 Next, we compute the order of the word frequency


> word.freq.order <- order(word.freq)

 Finally, we use the obtained order of the word frequency for


sorting our word list
> word.list[word.freq.order]

30th March 2016 Assistant Professor Dr. Bert ARNRICH 58

29
Sort word list by word frequency

 We can write all three steps in one line


> word.list[order(sapply(word.list, length))]

 We can revert the sorting order by specifying the argument


decreasing = T in the order function
> word.list[order(sapply(word.list, length),
decreasing = T)]

30th March 2016 Assistant Professor Dr. Bert ARNRICH 59

Sort word list alphabetically

 In a previous lecture we have already learned how to sort


our word list alphabetically

 First, we obtain the words from word.list using names()


> words <- names(word.list)

 Second, we sort the words alphabetically


> words.sorted <- sort(words)

 Finally, we sort word.list using words.sorted


> word.list[words.sorted]

 We can write all three steps in one line


> word.list[sort(names(word.list))]
30th March 2016 Assistant Professor Dr. Bert ARNRICH 60

30
Function findwords

 We enhance our function findwords by sorting the word list


alphabetically per default

 We further enhance our function findwords by providing an


additional sort option sort.by.freq

 Default value of sort.by.freq = FALSE and thus word list is sorted


alphabetically by default

 In case sort.by.freq = TRUE we will sort the word list by word


frequency

30th March 2016 Assistant Professor Dr. Bert ARNRICH 61

if-else

 In order to implement the sorting feature, we need a control


flow construct with the following functionality:

 Check the value of the variable sort.by.freq

 In case the condition sort.by.freq = TRUE is satisfied, sort by word


frequency else sort alphabetically

 A control flow construct which provide this functionality is the


so called if-else statement
if (condition) {expression1} else {expression2}

 Depending on whether condition is true, the result is


expression1 or else expression2

30th March 2016 Assistant Professor Dr. Bert ARNRICH 62

31
if-else

> x <- 2
> y <- if(x == 2) x else x+1
> y
[1] 2

> x <- 3
> y <- if(x == 2) x else x+1
> y
[1] 4

30th March 2016 Assistant Professor Dr. Bert ARNRICH 63

if-else

 In our findwords function we need to check the value of the


variable sort.by.freq

 In case the condition sort.by.freq = TRUE is satisfied, we


sort by word frequency else we sort alphabetically

if(sort.by.freq)
{
# sort by word frequency
return(word.list[order(sapply(word.list, length),
decreasing = T)])
}
else
{
# sort by word frequency
return(word.list[sort(names(word.list))])
}

30th March 2016 Assistant Professor Dr. Bert ARNRICH 64

32
Function findwords
## finds locations of each word in file
findwords <- function(file, sort.by.freq = F)
{
# fill word.vec from data in file
word.vec <- scan(file, "")

# initialize word list


word.list <- list()

# iterate through word vector


for(i in 1:length(word.vec))
{
# store current word in variable word
word <- word.vec[i]
# add current word to word.list
word.list[[word]] <- c(word.list[[word]], i)
}
# sort by word frequency or else sort alphabetically
if(sort.by.freq) {return(word.list[order(sapply(word.list, length),
decreasing = T)])} else {return(word.list[sort(names(word.list))])}
}

30th March 2016 Assistant Professor Dr. Bert ARNRICH 65

Function findwords
> findwords("text.txt")
Read 15 items
$a
[1] 1 5

$and
[1] 7 10 13

$another
[1] 8

$consists
[1] 3

$forth
[1] 15
...
30th March 2016 Assistant Professor Dr. Bert ARNRICH 66

33
Function findwords
> findwords("text.txt", sort.by.freq=T)
Read 15 items
$and
[1] 7 10 13

$a
[1] 1 5

$word
[1] 6 9

$so
[1] 11 14

$text
[1] 2
...
30th March 2016 Assistant Professor Dr. Bert ARNRICH 67

Plotting word frequencies

 From the resulting list of word positions returned by


findwords we can easily calculate the word frequencies
using sapply

> word.list <- findwords("text.txt",


sort.by.freq=T)
Read 15 items

> word.freq <- sapply(word.list, length)

> word.freq
and a word so text
3 2 2 2 1
consists of another on forth
1 1 1 1 1
30th March 2016 Assistant Professor Dr. Bert ARNRICH 68

34
Plotting word frequencies

 We create a barplot of the word frequencies


> barplot(word.freq, las=2)

3.0

2.5

2.0

1.5

1.0

0.5

0.0
and

word

so

another

on

forth
text

consists

of

30th March 2016 Assistant Professor Dr. Bert ARNRICH 69

Word frequency in Wikipedia

 Since the for loop allows us to iterate through large texts,


let’s apply our function to Wikipedia

 We select the Wikipedia article about the R programming


language

 We copy the article in a text editor

 We apply findwords and we create a barplot of the 10 most


frequent words

30th March 2016 Assistant Professor Dr. Bert ARNRICH 70

35
Word frequency in Wikipedia

 We copy the Wikipedia article about R programming


language and save it as R_wikipedia.txt

30th March 2016 Assistant Professor Dr. Bert ARNRICH 71

Word frequency in Wikipedia

> word.list <- findwords("R_wikipedia.txt",


sort.by.freq=T)
Read 3395 items

> word.freq <- sapply(word.list, length)

> barplot(word.freq[1:10], las=2)

30th March 2016 Assistant Professor Dr. Bert ARNRICH 72

36
Word frequency in Wikipedia

30th March 2016 Assistant Professor Dr. Bert ARNRICH 73

Homework

1. Write a function that iterates through a vector and computes


the sum of vector’s elements

2. Write a function that iterates through all columns and rows


of a matrix and computes the means of the columns and the
rows

3. Create a barplot of the 10 most frequent used words in your


favorite Wikipedia article

30th March 2016 Assistant Professor Dr. Bert ARNRICH 74

37

You might also like