You are on page 1of 93

UNIT 2 REVIEW

 Iteration
 Strings

 Files

1
UPDATING VARIABLES
 A common pattern in assignment statements is an assignment statement
that updates a variable - where the new value of the variable depends on
the old.

 This means “get the current value of x , add one, and then update x with
the new value.”
 If you try to update a variable that doesn’t exist, you get an error, because
Python evaluates the right side before it assigns a value to x :

 Before you can update a variable, you have to initialize it, usually with a
simple assignment:

 Updating a variable by adding 1 is called an increment; subtracting 1 is


called a decrement.
2
THE WHILE STATEMENT
 Computers are often used to automate repetitive tasks.
 Repeating identical or similar tasks without making
errors is something that computers do well and people do
poorly.
 Because iteration is so common, Python provides several
language features to make it easier.
 One form of iteration in Python is the while statement.

 Here is a simple program that counts down from five and


then says “Blastoff!”.

3
THE WHILE STATEMENT
n=5

No Yes
n>0? Program: Output:

print (n) n=5 5


while n > 0 : 4
print(n) 3
n = n -1
n=n–1 2
print('Blastoff!‘) 1
print(n) Blastoff!
print 'Blastoff' 0

Loops (repeated steps) have iteration variables that


change each time through a loop. Often these iteration
variables go through a sequence of numbers.
THE WHILE STATEMENT

 You can almost read the while statement as if it were English.


 It means, “While n is greater than 0, display the value of n and then
reduce the value of n by 1.
 When you get to 0, exit the while statement and display the word
Blastoff! ”
 More formally, here is the flow of execution for a while statement:

1. Evaluate the condition, yielding True or False .


2. If the condition is false, exit the while statement and continue execution at the
next statement.
3. If the condition is true, execute the body and then go back to step 1.
5
THE WHILE STATEMENT
 This type of flow is called a loop because the third step loops
back around to the top.
 Each time we execute the body of the loop, we call it an
iteration.
 For the above loop, we would say, “It had five iterations” which
means that the body of the loop was executed five times.
 The body of the loop should change the value of one or more
variables so that eventually the condition becomes false and the
loop terminates.
 We call the variable that changes each time the loop executes
and controls when the loop finishes the iteration variable.
 If there is no iteration variable, the loop will repeat forever,
resulting in an infinite loop.
6
INFINITE LOOPS
 In infinite loop because there is no iteration variable
telling you how many times to execute the loop.
 In the case of previous example, we can prove that the
loop terminates because we know that the value of n is
finite.
 We can see that the value of n gets smaller each time
through the loop, so eventually we have to get to 0.
 Other times a loop is obviously infinite because it has no
iteration variable at all.

7
“INFINITE LOOPS” AND BREAK
 Sometimes you don’t know it’s time to end a loop until
you get half way through the body.
 In that case you can write an infinite loop on purpose and
then use the break statement to jump out of the loop.
 This loop is obviously an infinite loop because the
logical expression on the while statement is simply the
logical constant True :

8
INFINITE LOOP
 If you make the mistake and run this code, you will learn
quickly that this program will run forever or until your
battery runs out
 Because the logical expression at the top of the loop is
always true by virtue of the fact that the expression is the
constant value True

9
n=5

No Yes AN INFINITE LOOP


n>0?

print(‘Lather‘) n=5
while n > 0 :
print(‘Rinse‘) print (‘Lather’)
print (‘Rinse‘)
print(‘Dry off!’)

print(‘Dry off!‘)

What is wrong with this loop?


n=0

No Yes ANOTHER LOOP


n>0?

print(‘Lather‘) n=0
while n > 0 :
print(‘Rinse‘) print(‘Lather’)
print(‘Rinse‘)
print(‘Dry off!‘)

print(‘Dry off!‘)
What does this loop do?
BREAKING OUT OF A LOOP
 While this is a dysfunctional infinite loop, we can still
use this pattern to build useful loops as long as we
carefully add code to the body of the loop to explicitly
exit the loop using break when we have reached the exit
condition.
 For example, suppose you want to take input from the
user until they type done .
 You could write:

1
BREAKING OUT OF A LOOP

 The break statement ends the current loop and jumps to the
statement immediately following the loop
 It is like a loop test that can happen anywhere in the body of
the loop
while True: > hello there
line = input('> ') hello there
if line == 'done' : > finished
break finished
print(line) > done
print(‘Done!‘) Done!
BREAKING OUT OF A LOOP

 The break statement ends the current loop and jumps to the
statement immediately following the loop
 It is like a loop test that can happen anywhere in the body of
the loop
while True: > hello there
line = input('> ') hello there
if line == 'done' : > finished
break finished
print(line) > done
print('Done!‘) Done!
BREAKING OUT OF A LOOP
 The loop condition is True , which is always true, so the
loop runs repeatedly until it hits the break statement.
 Each time through, it prompts the user with an angle
bracket.
 If the user types done , the break statement exits the
loop.
 Otherwise the program echoes whatever the user types
and goes back to the top of the loop.

1
FINISHING AN ITERATION WITH
CONTINUE
 Sometimes you are in an iteration of a loop and want to
finish the current iteration and immediately jump to the
next iteration.
 In that case you can use the continue statement to skip to
the next iteration without finishing the body of the loop
for the current iteration.
 Here is an example of a loop that copies its input until
the user types “done”, but treats lines that start with the
hash character as lines not to be printed (kind of like
Python comments).

1
FINISHING AN ITERATION WITH
CONTINUE

 The continue statement ends the current iteration and jumps


to the top of the loop and starts the next iteration

while True:
> hello there
line = input('> ')
hello there
if line[0] == '#' :
> # don't print this
continue
> print this!
if line == 'done':
print this!
break
> done
print(line)
Done!
print('Done!‘)
FINISHING AN ITERATION WITH
CONTINUE

 The continue statement ends the current iteration and


jumps to the top of the loop and starts the next iteration

while True:
> hello there
line = input('> ')
hello there
if line[0] == '#' :
> # don't print this
continue
> print this!
if line == 'done' :
print this!
break
> done
print(line)
Done!
print('Done!‘)
FINISHING AN ITERATION WITH
CONTINUE
 All the lines are printed except the one that starts with
the hash sign because when the continue is executed, it
ends the current iteration and jumps back to the while
statement to start the next iteration, thus skipping the
print statement.

1
INDEFINITE LOOPS
 While loops are called "indefinite loops" because they
keep going until a logical condition becomes False
 The loops we have seen so far are pretty easy to
examine to see if they will terminate or if they will be
"infinite loops"
 Sometimes it is a little harder to be sure if a loop will
terminate
DEFINITE LOOPS
 Sometimes we want to loop through a set of things such
as a list of words, the lines in a file or a list of numbers.
 When we have a list of things to loop through, we can
construct a definite loop using a for statement.
 We call the while statement an indefinite loop because
it simply loops until some condition becomes False
 The for loop is looping through a known set of items so
it runs through as many iterations as there are items in
the set.
 We say that "definite loops iterate through the members
of a set"
A DEFINITE LOOP WITH STRINGS

friends = ['Joseph', 'Glenn', 'Sally']


for friend in friends : Happy New Year: Joseph
print(‘Happy New Year:’, friend) Happy New Year: Glenn
print(‘Done!‘) Happy New Year: Sally
Done!
A DEFINITE LOOP WITH STRINGS
 In Python terms, the variable friends is a list1 of three strings
 The for loop goes through the list and executes the body
once for each of the three strings in the list resulting in this
output:

 Looking at the for loop, for and in are reserved Python


keywords friend is the iteration variable for the for loop.
 The variable friend changes for each iteration of the loop
and controls when the for loop completes.
 The iteration variable steps successively through the three
strings stored in the friends variable.
1
we’ll examine lists in more detail in a later chapter 2
A SIMPLE DEFINITE LOOP

5
4
for i in [5, 4, 3, 2, 1] :
3
print(i)
2
print(‘Blastoff!‘’)
1
Blastoff!
A SIMPLE DEFINITE LOOP

No
Yes
Done? Move i ahead 5
4
for i in [5, 4, 3, 2, 1] :
3
print(i) print(i)
2
print('Blastoff!‘)
1
Blastoff!

print(‘Blast off!‘)
Definite loops (for loops) have explicit iteration
variables that change each time through a loop. These
iteration variables move through the sequence or set.
LOOKING AT IN...

 The iteration variable


“iterates” though the Five-element sequence
sequence (ordered set) Iteration variable
 The block (body) of code
is executed once for each for i in [5, 4, 3, 2, 1] :
value in the sequence
print(i)
 The iteration variable
moves through all of the
values in the sequence
DEFINITE LOOPS
 Quite often we have a list of items of the lines in a file -
effectively a finite set of things
 We can write a loop to run the loop once for each of the
items in a set using the Python for construct
 These loops are called "definite loops" because they
execute an exact number of times
 We say that "definite loops iterate through the members
of a set"
COUNTING LOOPS
 Counts the number of items in a list

 We set the variable count to zero before the loop starts, then we
write a for loop to run through the list of numbers.
 Our iteration variable is named itervar and while we do not use
itervar in the loop, it does control the loop and cause the loop
body to be executed once for each of the values in the list.
 In the body of the loop, we add one to the current value of count
for each of the values in the list
2
SUMMING LOOPS

 In this loop we do use the iteration variable.


 Instead of simply adding one to the count as in the previous loop, we
add the actual number (3, 41, 12, etc.) to the running total during each
loop iteration.
 So before the loop starts total is zero because we have not yet seen
any values.
 As the loop executes, total accumulates the sum of the elements; a
variable used this way is sometimes called an accumulator
2
LOOPING THROUGH A SET

$ python basicloop.py
Before
9
print(‘Before’)
41
for thing in [9, 41, 12, 3, 74, 15] :
12
print(thing)
3
print(‘After’)
74
15
After
COUNTING IN A LOOP

zork = 0 $ python countloop.py


print(‘Before', zork) Before 0
for thing in [9, 41, 12, 3, 74, 15] : 19
zork = zork + 1 2 41
print(zork, thing) 3 12
print('After', zork) 43
5 74
6 15
After 6
To count how many times we execute a loop we introduce a counter variable
that starts at 0 and we add one to it each time through the loop.
SUMMING IN A LOOP

$ python countloop.py
zork = 0 Before 0
print('Before', zork) 99
for thing in [9, 41, 12, 3, 74, 15] : 50 41
zork = zork + thing 62 12
print(zork, thing) 65 3
print('After', zork) 139 74
154 15
After 154

To add up a value we encounter in a loop, we introduce a sum variable that


starts at 0 and we add the value to the sum each time through the loop.
FINDING THE AVERAGE IN A LOOP

count = 0
sum = 0 $ python averageloop.py
print('Before', count, sum) Before 0 0
for value in [9, 41, 12, 3, 74, 15] : 199
count = count + 1 2 50 41
sum = sum + value 3 62 12
print(count, sum, value) 4 65 3
print('After', count, sum, sum / count) 5 139 74
6 154 15
After 6 154 25.666
An average just combines the counting and sum patterns and divides when the
loop is done.
FILTERING IN A LOOP
print(‘Before’) $ python search1.py
for value in [9, 41, 12, 3, 74, 15] : Before
if value > 20: Large number 41
print('Large number', value) Large number 74
print(‘After’) After

We use an if statement in the loop to catch / filter the values


we are looking for.
SEARCH USING A BOOLEAN VARIABLE
$ python search1.py
found = False
Before False
print(‘Before', found)
False 9
for value in [9, 41, 12, 3, 74, 15] :
False 41
if value == 3 :
False 12
found = True
True 3
print(found, value)
False 74
found = False
False 15
print('After', found)
After True

If we just want to search and know if a value was found - we use a variable that start
at False and is set to True as soon as we find what we are looking for.
FIND THE LARGEST NUMBER

 Python has an "is" operaror that can


be used in logical expressions
 Implies 'is the same as'
 Similar to, but stronger than ==
 'is not' also is a logical operator

 Before the loop, we set largest to the constant None .


 None is a special constant value which we can store in a
variable to mark the variable as “empty”.
3
FIND THE LARGEST NUMBER

 Python has an "is" operaror that can


be used in logical expressions
 Implies 'is the same as'
 Similar to, but stronger than ==
 'is not' also is a logical operator

 Before the loop, we set largest to the constant None .


 None is a special constant value which we can store in a
variable to mark the variable as “empty”.
3
FIND THE SMALLEST NUMBER

3
SUMMARY
 While loops (indefinite)
 Infinite loops

 Using break

 Using continue

 For loops (definite)

 Iteration variables

 Largest or smallest
STRING DATA TYPE
 A string is a sequence of >>> str1 = "Hello”
characters >>> str2 = 'there'
 A string literal uses quotes
>>> bob = str1 + str2
>>> print(bob)
'Hello' or “Hello” Hellothere
 For strings, + means >>> str3 = '123'
“concatenate” >>> str3 = str3 + 1
 When a string contains Traceback (most recent call last):
numbers, it is still a string File "<stdin>", line 1, in
<module>TypeError: cannot
 We can convert numbers in a
concatenate 'str' and 'int' objects
string into a number using int() >>> x = int(str3) + 1
>>> print(x)
124
>>>
READING
AND >>> name = input('Enter:')
Enter:Chuck
CONVERTING >>> print(name)
Chuck
• We prefer to read data in >>> apple = input('Enter:')
using strings and then Enter:100
parse and convert the data >>> x = apple – 10
as we need Traceback (most recent call last): File
• This gives us more control "<stdin>", line 1, in
over error situations <module>TypeError: unsupported
and/or bad user input operand type(s) for -: 'str' and 'int'
• Raw input numbers must >>> x = int(apple) – 10
be converted from strings >>> print(x)
90
LOOKING INSIDE STRINGS

• We can get at any single


b a n a n a
character in a string using an
0 1 2 3 4 5
index specified in square
brackets >>> fruit = 'banana'
>>> letter = fruit[1]
• The index value must be an >>> print(letter)
integer and starts at zero a
>>> n = 3
• The index value can be an
>>> w = fruit[n - 1]
expression that is computed >>> print(w)
n
A CHARACTER TOO FAR

• You will get a python error if you


attempt to index beyond the end of >>> zot = 'abc'
a string.
>>> print(zot[5])
• So be careful when constructing Traceback (most recent call last):
index values and slices File "<stdin>", line 1, in
• You can use any expression, <module>IndexError: string index
including variables and operators, out of range
as an index, but the value of the >>>
index has to be an integer.
Otherwise you get an error.
STRINGS HAVE LENGTH

• There is a built-in function


len that gives us the length of b a n a n a
a string 0 1 2 3 4 5

>>> fruit = 'banana'


>>> print(len(fruit))
6
STRING LENGTH
 To get the last letter of a string, you might be tempted to try
something like this:

 The reason for the IndexError is that there is no letter in ’banana’


with the index 6.
 Since we started counting at zero, the six letters are numbered 0 to
5.
 To get the last character, you have to subtract 1 from length:

 a
4
STRING LENGTH
 Alternatively, you can use negative indices, which count
backward from the end of the string.
 The expression fruit[-1] yields the last letter, fruit[-2]
yields the second to last, and so on.

4
LOOPING THROUGH STRINGS

• Using a while statement


and an iteration variable,
fruit = 'banana' 0b
and the len function, we index = 0 1a
can construct a loop to while index < len(fruit) : 2n
look at each of the letter = fruit[index] 3a
letters in a string print(index, letter) 4n
individually index = index + 1 5a
LOOPING THROUGH STRINGS

• A definite loop using a


for statement is much
b
more elegant a
• The iteration variable is fruit = 'banana' n
for letter in fruit : a
completely taken care of print(letter) n
by the for loop
a
LOOPING THROUGH STRINGS

• A definite loop using a


fruit = 'banana'
for statement is much for letter in fruit : b
more elegant print(letter) a
• The iteration variable is n
completely taken care of a
index = 0
n
by the for loop while index < len(fruit) :
a
letter = fruit[index]
print(letter)
index = index + 1
LOOPING AND COUNTING

• This is a simple loop that


loops through each letter
in a string and counts the word = 'banana'
number of times the loop count = 0
for letter in word :
encounters the 'a' if letter == 'a' :
character. count = count + 1
print(count)
LOOKING DEEPER INTO IN

 The iteration variable


“iterates” though the Six-character string
sequence (ordered set) Iteration variable
 The block (body) of code
is executed once for each for letter in 'banana' :
value in the sequence
print(letter)
 The iteration variable
moves through all of the
values in the sequence
SLICING STRINGS

M o n t y P y t h o n
0 1 2 3 4 5 6 7 8 9 10 11

 We can also look at any >>> s = 'Monty Python'


continuous section of a string >>> print(s[0:4])
using a colon operator
Mont
 The second number is one beyond
>>> print(s[6:7])
the end of the slice - “up to but not
including” P
 If the second number is beyond
>>> print(s[6:20])
the end of the string, it stops at the Python
end
 The operator [n:m] returns the
part of the string from the “n-eth”
character to the “m-eth” character,
including the first but excluding
the last.
SLICING STRINGS
M o n t y P y t h o n
0 1 2 3 4 5 6 7 8 9 10 11

• If we leave off the first >>> s = 'Monty Python'


number or the last number of >>> print(s[:2])
the slice, it is assumed to be Mo
the beginning or end of the >>> print(s[8:])
string respectively Thon
• If the first index is greater >>> print(s[:])
than or equal to the second Monty Python
the result is an empty string,
represented by two quotation
marks:
STRING CONCATENATION

• When the + operator >>> a = 'Hello'


is applied to strings, >>> b = a + 'There'
it means >>> print(b)
HelloThere
"concatenation"
>>> c = a + ' ' + 'There'
>>> print(c)
Hello There
>>>
STRINGS ARE IMMUTABLE
 It is tempting to use the [] operator on the left side of an assignment, with the
intention of changing a character in a string.
 For example:

 The “object” in this case is the string and the “item” is the character you
tried to assign.
 The reason for the error is that strings are immutable, which means you
can’t change an existing string.
 The best you can do is create a new string that is a variation on the original:

 This example concatenates a new first letter onto a slice of greeting . It has
no effect on the original string.
5
USING IN AS AN OPERATOR

• The in keyword can also >>> fruit = 'banana’


>>> 'n' in fruit
be used to check to see if True
one string is "in" another >>> 'm' in fruit
string False
>>> 'nan' in fruit
• The in expression is a True
logical expression and >>> if 'a' in fruit :
returns True or False and ... print('Found it!’)
...
can be used in an if Found it!
statement >>>
STRING COMPARISON
word = ‘Pineapple'
if word == 'banana':
print (‘All right, bananas.’)

if word < 'banana':


print(‘Your word,' + word + ', comes before banana.’)
elif word > 'banana':
print(‘Your word,' + word + ', comes after banana.’)
else:
print(‘All right, bananas.‘)

 Python does not handle uppercase and lowercase letters the same way that people do.
 All the uppercase letters come before all the lowercase letters, so:

 A common way to address this problem is to convert strings to a standard format, such as
all lowercase, before performing the comparison.
5
STRING LIBRARY
• Python has a number of string functions
which are in the string library
• These functions are already built into every
string - we invoke them by appending the
function to the string variable
>>> greet = 'Hello Bob‘
• These functions do not modify the original
string, instead they return a new string that
>>> zap = greet.lower()
has been altered string methods >>> print(zap)
• Strings are an example of Python objects. hello bob
• An object contains both data (the actual >>> print(greet)
string itself) as well as methods Hello Bob
• Methods are effectively functions which >>> print(‘Hi There'.lower()’)
that are built into the object and are hi there
available to any instance of the object. >>>
Python has a function called
dir() that lists the methods
>>> stuff = 'Hello world’ available for an object.

>>> type(stuff)
<class ‘str’>
>>> dir(stuff)
['capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs',
'find', 'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace',
'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition', 'replace', 'rfind',
'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith',
'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

http://docs.python.org/lib/string-methods.html
CALLING A STRING OBJECT METHOD
 Calling a method is similar to calling a function—it
takes arguments and returns a value—but the syntax is
different.
 We call a method by appending the method name to the
variable name using the period as a delimiter.

6
STRING LIBRARY

str.capitalize() str.replace(old, new[, count])


str.center(width[, fillchar]) str.lower()
str.endswith(suffix[, start[, end]]) str.rstrip([chars])
str.find(sub[, start[, end]]) str.strip([chars])
str.lstrip([chars]) str.upper()

http://docs.python.org/lib/string-methods.html
SEARCHING A STRING

• We use the find() function b a n a n a


to search for a substring 0 1 2 3 4 5
within another string
• find() finds the first >>> fruit = 'banana'
occurance of the substring >>> pos = fruit.find('na')
>>> print(pos)
• If the substring is not 2
found, find() returns -1 >>> aa = fruit.find('z')
• Remember that string >>> print(aa)
position starts at zero -1
MAKING EVERYTHING UPPER CASE

• You can make a copy of a


string in lower case or upper >>> greet = 'Hello Bob'
>>> nnn = greet.upper()
case
>>> print(nnn)
• Often when we are searching HELLO BOB
for a string using find() - we >>> www = greet.lower()
first convert the string to lower >>> print(www)
hello bob
case so we can search a string
>>>
regardless of case
SEARCH AND REPLACE

 The replace()
function is like a >>> greet = 'Hello Bob'
“search and replace” >>> nstr = greet.replace('Bob','Jane')
operation in a word >>> print(nstr)
processor Hello Jane
 It replaces all >>> nstr = greet.replace('o','X')
occurrences of the >>> print(nstr)
search string with the HellX BXb
replacement string >>>
STRIPPING WHITESPACE

• Sometimes we want to
take a string and remove >>> greet = ' Hello Bob '
whitespace at the >>> greet.lstrip()
beginning and/or end 'Hello Bob '
>>> greet.rstrip()
• lstrip() and rstrip() to the 'Hello Bob'
left and right only >>> greet.strip()
• strip() Removes both begin 'Hello Bob'
and ending whitespace >>>
PREFIXES
 Some methods such as startswith return boolean values.

 You will note that startswith requires case to match so sometimes we take a line and map it all
to lowercase before we do any checking using the lower method.

 In the last example, the method lower is called and then we use startswith to check to see if
the resulting lowercase string starts with the letter “p”.
 As long as we are careful with the order, we can make multiple method calls in a single
expression.
6
Parsing and Extracting

21 31

From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008

>>> data = 'From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008’


>>> atpos = data.find('@')
>>> print(atpos)
21
>>> sppos = data.find(‘ ',atpos)
>>> print(sppos)
31
>>> host = data[atpos+1 : sppos]
>>> print(host)
uct.ac.za
FORMAT OPERATOR
 The format operator, % allows us to construct strings,
replacing parts of the strings with the data stored
invariables.
 When applied to integers, % is the modulus operator.

 But when the first operand is a string, % is the format


operator.
 The first operand is the format string, which contains one
or more format sequences that specify how the second
operand is formatted.
 The result is a string.

6
FORMAT OPERATOR
 For example, the format sequence ' %d ' means that the
second operand should be formatted as an integer ( d
stands for “decimal”):

 The result is the string ' 42 ', which is not to be confused


with the integer value 42 .
 A format sequence can appear anywhere in the string, so
you can embed a value in a sentence:

6
FORMAT OPERATOR
 The following example uses '%d' to format an integer, '%g'
to format a floating point number, and '%s' to format a
string:

 In the first example, there aren’t enough elements; in the


second, the element is the wrong type.
7
SUMMARY
• String type
• Read/Convert
• Indexing strings []
• Slicing strings [2:4]
• Looping through strings with for and while
• Concatenating strings with +
• String operations
READING FILES
Software What It is time to go find
Next? some Data to mess
Input Central
with!
and Output Processing Files R
Devices Unit Us

Secondary
if x< 3: print Memory

Main From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008


Return-Path: <postmaster@collab.sakaiproject.org>

Memory Date: Sat, 5 Jan 2008 09:12:18 -0500To:


source@collab.sakaiproject.orgFrom:
stephen.marquard@uct.ac.zaSubject: [sakai] svn commit: r39772 -
content/branches/Details: http://source.sakaiproject.org/viewsvn/?
view=rev&rev=39772
...
FILE PROCESSING

• A text file can be thought of as a sequence of lines

From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008


Return-Path: <postmaster@collab.sakaiproject.org>
Date: Sat, 5 Jan 2008 09:12:18 -0500To: source@collab.sakaiproject.orgFrom:
stephen.marquard@uct.ac.zaSubject: [sakai] svn commit: r39772 -
content/branches/Details: http://source.sakaiproject.org/viewsvn/?
view=rev&rev=39772

http://www.py4inf.com/code/mbox-short.txt
OPENING A FILE
 Before we can read the contents of the file we must tell
Python which file we are going to work with and what
we will be doing with the file
 This is done with the open() function

 open() returns a “file handle” - a variable used to


perform operations on the file
 Kind of like “File -> Open” in a Word Processor
USING OPEN()

• handle = open(filename, mode)


• returns a handle use to manipulate the file
• filename is a string
• mode is optional and should be 'r' if we are planning reading the
file and 'w' if we are going to write to the file.

fhand = open('mbox.txt', 'r')

http://docs.python.org/lib/built-in-funcs.html
WHAT IS A HANDLE?

>>> fhand = open('mbox.txt')


>>> print(fhand)
<open file 'mbox.txt', mode 'r' at 0x1005088b0>
WHEN FILES ARE MISSING

>>> fhand = open('stuff.txt')


Traceback (most recent call last):
File "<stdin>", line 1, in <module>IOError: [Errno 2]
No such file or directory: 'stuff.txt'
THE NEWLINE
CHARACTER >>> stuff = 'Hello\nWorld!’
• We use a special >>> print(stuff)
character to indicate Hello
World
when a line ends called >>> stuff = 'X\nY’
the "newline" >>> print(stuff)
• We represent it as \n in X
strings Y
>>> len(stuff)
• Newline is still one 3
character - not two
FILE PROCESSING

• A text file can be thought of as a sequence of lines

From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008


Return-Path: <postmaster@collab.sakaiproject.org>
Date: Sat, 5 Jan 2008 09:12:18 -0500To: source@collab.sakaiproject.orgFrom:
stephen.marquard@uct.ac.zaSubject: [sakai] svn commit: r39772 -
content/branches/Details: http://source.sakaiproject.org/viewsvn/?
view=rev&rev=39772
FILE PROCESSING

• A text file has newlines at the end of each line

From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008\n


Return-Path: <postmaster@collab.sakaiproject.org>\n
Date: Sat, 5 Jan 2008 09:12:18 -0500\n
To: source@collab.sakaiproject.org\n
From: stephen.marquard@uct.ac.za\nSubject: [sakai] svn commit: r39772 -
content/branches/\n
Details: http://source.sakaiproject.org/viewsvn/?view=rev&rev=39772\n
FILE HANDLE AS A SEQUENCE

• A file handle open for read can


be treated as a sequence of
strings where each line in the xfile = open('mbox.txt')
file is a string in the sequence for cheese in xfile:
• We can use the for statement print(cheese)
to iterate through a sequence
• Remember - a sequence is an
ordered set
COUNTING LINES IN A FILE

• Open a file read-only fhand = open('mbox.txt')


• Use a for loop to read count = 0
each line for line in fhand:
count = count + 1
• Count the lines and print print('Line Count:', count)
out the number of lines

$ python open.py
Line Count: 132045
SEARCHING THROUGH A FILE
• We can put an if statement in our for loop to only print lines
that meet some criteria

fhand = open('mbox-short.txt')
for line in fhand:
if line.startswith('From:') :
print(line)
OOPS!

What are all these blank From: stephen.marquard@uct.ac.za


lines doing here?
From: louis@media.berkeley.edu

From: zqian@umich.edu

From: rjlowe@iupui.edu
...
OOPS!

What are all these blank From: stephen.marquard@uct.ac.za\n


lines doing here? \n
From: louis@media.berkeley.edu\n
Each line from the file has a \n
newline at the end. From: zqian@umich.edu\n
\n
The print statement adds a From: rjlowe@iupui.edu\n
newline to each line. \n
...
SEARCHING THROUGH A FILE (FIXED)

• We can strip the fhand = open('mbox-short.txt')


whitespace from the for line in fhand:
right hand side of the line = line.rstrip()
if line.startswith('From:') :
string using rstrip() print(line)
from the string library
• The newline is From: stephen.marquard@uct.ac.za
considered "white From: louis@media.berkeley.edu
From: zqian@umich.edu
space" and is stripped From: rjlowe@iupui.edu
....
SKIPPING WITH CONTINUE
• We can conveniently skip a line by using the continue
statement

fhand = open('mbox-short.txt')
for line in fhand:
line = line.rstrip()
if not line.startswith('From:') :
continue
print(line)
USING IN TO SELECT LINES
•We can look for a string anywhere in a line as our selection
criteria

fhand = open('mbox-short.txt')
for line in fhand:
line = line.rstrip()
if not '@uct.ac.za' in line :
continue
print(line)

From stephen.marquard@uct.ac.za Sat Jan 5 09:14:16 2008


X-Authentication-Warning: set sender to stephen.marquard@uct.ac.za using –f
From: stephen.marquard@uct.ac.zaAuthor: stephen.marquard@uct.ac.za
From david.horwitz@uct.ac.za Fri Jan 4 07:02:32 2008
X-Authentication-Warning: set sender to david.horwitz@uct.ac.za using -f...
PROMPT FOR FILE NAME
fname = input('Enter the file name: ')
fhand = open(fname)
count = 0
for line in fhand:
if line.startswith('Subject:') :
count = count + 1
print('There were', count, 'subject lines in', fname)

Enter the file name: mbox.txt


There were 1797 subject lines in mbox.txt

Enter the file name: mbox-short.txt


There were 27 subject lines in mbox-short.txt
BAD FILE NAMES
fname = input('Enter the file name: ')
try:
fhand = open(fname)
except:
print('File cannot be opened:', fname)
exit()
count = 0
for line in fhand:
if line.startswith('Subject:') :
count = count + 1
print('There were', count, 'subject lines in', fname)

Enter the file name: mbox.txt


There were 1797 subject lines in mbox.txt

Enter the file name: na na boo boo


File cannot be opened: na na boo boo
WRITING FILES
 To write a file, you have to open it with mode ' w ' as a second
parameter:

 The write method of the file handle object puts data into the file.

 We must make sure to manage the ends of lines as we write to


the file by explicitly inserting the newline character when we
want to end a line

 When you are done writing, you have to close the file to make
sure that the last bit of data is physically written to the disk so it
will not be lost if the power goes off.
9
SUMMARY
• Secondary storage
• Opening a file - file handle
• File structure - newline character
• Reading a file line-by-line with a for loop
• Searching for lines
• Reading file names
• Dealing with bad files

You might also like