You are on page 1of 78

UNIT-III

PYTHON STRINGS REVISITED

INTRODUCTION:

A Python string is a sequence of characters. In even simpler terms, a string is a piece


of text. It can include letters, numbers, and even special characters. There is a built-in class
‘str’ for handling Python string. Python string is an ordered collection of characters which is
used to represent and store the text-based information. Strings are stored as individual
characters in a contiguous memory location. In Python, there are 3 ways of enclosing a string:

 Enclosing string in Single quotes

Example:

String=’python’
 Enclosing string in double quotes

Example:

String=” python programming”


 Enclosing string in tiple quotes

Example:

String=’’’ welcome ‘’’


String=””” welcome”””

INDEXING:

In Python, strings are ordered sequences of character data, and thus Individual characters in a
string can be accessed by specifying the string name followed by a number in square brackets
([]). The number in the brackets is called as index. String indexing in Python is zero-based:
the first character in the string has index 0, the next has index 1, and so on. The index of the
last character will be the length of the string minus one.

String indices can also be specified with negative numbers, in which case indexing
occurs from the end of the string backward: -1 refers to the last character, -2 the second-to-
last character, and so on.

Example:
sampleStr = "Hello, this is a sample string"

print( "Character at index 4 is : " , sampleStr[4] )

print( "Last character in string : " , sampleStr[-1] )

print( "Second Last character in string : " , sampleStr[-2] )

print( "First character in string : " , sampleStr[ -len(sampleStr) ] )

output:

Character at index 4 is : o

Last character in string : g

Second Last character in string : n

First character in string : H

Traversing a string:

Traversing a string means accessing all the elements of the string one after the other by using
the subscript. A string can be traversed using for loop or while loop. simple way to traverse
the string is by using Python range function. This method lets us access string elements using
the index.

"""

Python Program:

Using range() to iterate over a string in Python

"""

string_to_iterate = "PYTHON"

for char_index in range(len(string_to_iterate)):

print(string_to_iterate[char_index]

Output:

Y
T

Concatenating appending and multiplying strings in python:

Concatenation means joining strings together end-to-end to create a new string. To


concatenate strings, we use the + operator. Keep in mind that when we work with numbers, +
will be an operator for addition, but when used with strings it is a joining operator.

Appending strings refers to adding (+=) one or more strings to the end of another string.

In some cases, these terms are absolutely interchangeable. Appending strings is the same
operation as concatenating strings at the end of another.
Use the multiplication operator * to repeat a string multiple times

Multiply a string with the multiplication operator * by an integer n to concatenate the string
with itself n times. Call print(value) with the resultant string as value to print it.

Printing a string multiple times on the same line outputs multiple copies of the string without
newlines.

Strings are immutable

Strings are not mutable in Python. Strings are a immutable data types which means that its
value cannot be updated. An immutable object is one that, once created, will not change in its
lifetime. Example Try the following code to execute:
name_1 = "Varun"

name_1[0] = 'T'

Output

Traceback (most recent call last):

File "main.py", line 2, in <module> name_1[0] = 'T'

TypeError: 'str' object does not support item assignment

You will get an error message when you want to change the content of the string.

One possible solution is to create a new string object with necessary modifications:

name_1 = "Varun"

name_2 = "T"+ name_1[1:]

print("name_1 = ", name_1, "and name_2 = ", name_2)

Output

name_1 = Varun and name_2 = Tarun

To identify that they are different strings, check with the id() function:

name_1 = "Varun"

name_2 = "T" + name_1[1:]

print("id of name_1 = ", id(name_1))

print("id of name_2 = ", id(name_2))

Output

id of name_1 = 140225710197088

id of name_2 = 140225710197704

To understand more about the concept of string immutability, consider the following code:

name_1 = "Varun"

name_2 = "Varun"
print("id of name_1 = ", id(name_1))

print("id of name_2 = ", id(name_2))

Output

id of name_1 = 140685486749936

id of name_2 = 140685486749936

When the above lines of code are executed, you will find that the id’s of both name_1 and
name_2 objects, which refer to the string “Varun”, are the same.

To understand more execute the following statements:

name_1 = "Varun"

print("id of name_1 = ", id(name_1))

name_1 = "Tarun"

print("id of name_1 afer initialing with new value = ", id(name_1))

Output

id of name_1 = 140449996065064

id of name_1 afer initialing with new value = 140449996065680

As can be seen in the above example, when a string reference is reinitialized with a new
value, it is creating a new object rather than overwriting the previous value.

In Python, strings are made immutable so that programmers cannot alter the contents of the
object (even by mistake). This avoids unnecessary bugs.

String formatting operator:

Python uses C-style string formatting to create new, formatted strings. The ”%”
operator is used to format a set of variables enclosed in a “tuple” (a fixed size list), together
with a format string, which contains normal text together with “argument specifiers”, special
symbols like "%s" and "%d".

my_string = "InterviewBit"

print("Hello %s" InterviewBit)


# prints "Hello InterviewBit"

To use two or more argument specifiers, use a tuple (parentheses):

qty = 10

item_name = "chocolate"

rs = 100

print("You can buy %d %s in %d rupees" % (qty, item_name, rs))

# prints "You can buy 10 chocolate in 100 rupees"

Any object which is not a string can be formatted using the %s operator as well. The string
which returns from the “repr” method of that object is formatted as the string. For example:

my_list = [1, 2, 3]

print("Given list: %s" %my_list)

Here are some basic argument specifiers you should know:

%s - String (or any object with a string representation, like numbers)

%d - Integers

%f - Floating point numbers

%.f - Floating point numbers with a fixed amount of digits to the right of the dot.

%x/%X - Integers in hex representation (lowercase/uppercase)

String format() Method

The format() method formats the specified value(s) and insert them inside the string’s
placeholder.
The placeholder is defined using curly brackets: {}.
The format() method returns the formatted string.

The placeholders can be identified using named indexes {price}, numbered indexes {0}, or
even empty placeholders {}.

qty = 10

item_name = "chocolate"
rs = 100

#named indexes

print("You can buy {quantity} {item} in {amt} rupees".format(quantity = qty, item =


item_name, amt = rs))

#numbered indexes:

print("You can buy {0} {1} in {2} rupees".format(qty, item_name, rs))

#empty placeholders:

print("You can buy {} {} in {} rupees".format(qty, item_name, rs))

# prints "You can buy 10 chocolate in 100 rupees"

String Methods in Python

Everything in Python is an object. A string is an object too. Python provides us with various
methods to call on the string object. Note that none of these methods alters the actual string.
They instead return a copy of the string.This copy is the manipulated version of the string.

Method Description Examples

>>> mystring = "hello python"


Returns a copy of the string with its
capitalize() first character capitalized and the rest >>> print(mystring.capitalize())
lowercased.
Hello python

>>> mystring = "hello PYTHON"


Returns a casefolded copy of the string.
Casefold() Casefolded strings may be used for >>> print(mystring.casefold())
caseless matching.
hello python

Returns the string centered in a string >>> mystring = "Hello"


Center(width, of length width. Padding can be done
>>> x = mystring.center(12,
[fillchar]) using the specified fillchar (the default
"-")
padding uses an ASCII space). The
original string is returned if width is >>> print(x)
less than or equal to len(s)
---Hello----

>>> mystr = "Hello Python"

>>> print(mystr.count("o"))

>>> print(mystr.count("th"))

1
Returns the number of non-overlapping
>>> print(mystr.count("l"))
occurrences of substring (sub) in the
Count(sub, [start],
range [start, end]. Optional 2
[end])
arguments startand end are interpreted
>>> print(mystr.count("h"))
as in slice notation.
1

>>> print(mystr.count("H"))

>>> print(mystr.count("hH"))

Returns an encoded version of the >>> mystr = 'python!'


string as a bytes object. The default
>>> print('The string is:',
encoding is utf-8. errors may be given
to set a different error handling mystr)
Encode(encoding =
“utf-g”, errors = scheme. The possible value for errors The string is: python!
“strict”) are:
>>> print('The encoded

version is: ',


• strict (encoding errors raise
a UnicodeError) mystr.encode("ascii",
• ignore "ignore"))

• replace The encoded version is:

• xmlcharrefreplace b'python!'

• backslashreplace >>> print('The encoded

• any other name registered via version (with replace) is:',


codecs.register_error()
mystr.encode("ascii",

"replace"))

The encoded version (with

replace) is: b'python!'

>>> mystr = "Python"

>>>

print(mystr.endswith("y"))
Returns True if the string ends with the
endswith(suffix,
specified suffix, otherwise it returns False
[start], [end])
False.
>>>

print(mystr.endswith("hon"))

True

>>> mystr = "1\t2\t3"

>>> print(mystr)
Returns a copy of the string where all
Expandtabs(tabsize= tab characters are replaced by one or 123

8) more spaces, depending on the current >>>


column and the given tab size.
print(mystr.expandtabs())

123
>>>

print(mystr.expandtabs(tabsi

ze=15))

12

>>>

print(mystr.expandtabs(tabsi

ze=2))

123

>>> mystring = "Python"

>>>

print(mystring.find("P"))
Returns the lowest index in the string
Find(sub, [start],
where substring sub is found within the 0
[end])
slice s[start:end].
>>>

print(mystring.find("on"))

>>> print("{} and

Performs a string formatting operation. {}".format("Apple",


The string on which this method is
Format(*args, "Banana"))
called can contain literal text or
**kwargs) Apple and Banana
replacement fields delimited by braces
{}. >>> print("{1} and

{0}".format("Apple",
"Banana"))

Banana and Apple

>>> print("{lunch} and

{dinner}".format(lunch="Peas

", dinner="Beans"))

Peas and Beans

>>> lunch = {"Food":

"Pizza", "Drink": "Wine"}

>>> print("Lunch: {Food},

{Drink}".format_map(lunch))

Lunch: Pizza, Wine

>>> class Default(dict):

def __missing__(self,
Similar to format(**mapping), except
format_map(mappin
that mapping is used directly and not key):
g)
copied to a dictionary.
return key

>>> lunch = {"Drink":

"Wine"}

>>> print("Lunch: {Food},

{Drink}".format_map(Default(

lunch)))

Lunch: Food, Wine


>>> mystr = "HelloPython"

>>> print(mystr.index("P"))

5
Searches the string for a specified
Index(sub, [start], >>>
value and returns the position of where
[end]) print(mystr.index("hon"))
it was found
8

>>> print(mystr.index("o"))

>>> mystr = "HelloPython"

>>> print(mystr.isalnum())

True

>>> a = "123"
Returns True if all characters in the
isalnum >>> print(a.isalnum())
string are alphanumeric
True

>>> a= "$*%!!!"

>>> print(a.isalnum())

False

>>> mystr = "HelloPython"

>>> print(mystr.isalpha())
Returns True if all characters in the
Isalpha() True
string are in the alphabet
>>> a = "123"

>>> print(a.isalpha())
False

>>> a= "$*%!!!"

>>> print(a.isalpha())

False

>>> mystr = "HelloPython"

>>> print(mystr.isdecimal())

False

>>> a="1.23"

>>> print(a.isdecimal())

Returns True if all characters in the False


Isdecimal()
string are decimals >>> c = u"\u00B2"

>>> print(c.isdecimal())

False

>>> c="133"

>>> print(c.isdecimal())

True

>>> c="133"

>>> print(c.isdigit())

Returns True if all characters in the True


Isdigit()
string are digits >>> c = u"\u00B2"

>>> print(c.isdigit())

True
>>> a="1.23"

>>> print(a.isdigit())

False

>>> c="133"

>>> print(c.isidentifier())

False

>>> c="_user_123"
Returns True if the string is an
isidentifier() >>> print(c.isidentifier())
identifier
True

>>> c="Python"

>>> print(c.isidentifier())

True

>>> c="Python"

>>> print(c.islower())

False

Returns True if all characters in the >>> c="_user_123"


Islower()
string are lower case >>> print(c.islower())

True

>>> print(c.islower())

False

Returns True if all characters in the >>> c="133"


Isnumeric()
string are numeric >>> print(c.isnumeric())
True

>>> c="_user_123"

>>> print(c.isnumeric())

False

>>> c="Python"

>>> print(c.isnumeric())

False

>>> c="133"

>>> print(c.isprintable())

True

>>> c="_user_123"
Returns True if all characters in the
isprintable() >>> print(c.isprintable())
string are printable
True

>>> c="\t"

>>> print(c.isprintable())

False

>>> c="133"

>>> print(c.isspace())

Returns True if all characters in the False


isspace()
string are whitespaces >>> c="Hello Python"

>>> print(c.isspace())

False
73

>>> c="Hello"

>>> print(c.isspace())

False

>>> c="\t"

>>> print(c.isspace())

True

>>> c="133"

>>> print(c.istitle())

False

>>> c="Python"
Returns True if the string follows the
istitle() >>> print(c.istitle())
rules of a title
True

>>> c="\t"

>>> print(c.istitle())

False

>>> c="Python"

>>> print(c.isupper())

Returns True if all characters in the False


isupper()
string are upper case >>> c="PYHTON"

>>> print(c.isupper())

True
>>> c="\t"

>>> print(c.isupper())

False

>>> a ="-"

>>> print(a.join("123"))

1-2-3

>>> a="Hello Python"


Joins the elements of an iterable to the
join(iterable) >>> a="**"
end of the string
>>> print(a.join("Hello

Python"))

H**e**l**l**o**

**P**y**t**h**o**n

>>> a="Hello"

ljust(width[,fillchar] Returns a left justified version of the >>> b = a.ljust(12, "_")

) string >>> print(b)

Hello_______

>>> a = "Python"

lower() Converts a string into lower case >>> print(a.lower())

Python

>>> a = " Hello "

lstrip([chars]) Returns a left trim version of the string >>> print(a.lstrip(), "!")

Hello
>>> frm = "SecretCode"

>>> to = "4203040540"

>>> trans_table =

Returns a translation table to be used in str.maketrans(frm,to)


maketrans(x[, y[, z]])
translations >>> sec_code = "Secret

Code".translate(trans_table)

>>> print(sec_code)

400304 0540

>>> mystr = "Hello-Python"

>>> print(mystr.partition("-

"))

Returns a tuple where the string is ('Hello', '-', 'Python')


partition(sep)
parted into three parts 74

>>>

print(mystr.partition("."))

('Hello-Python', '', '')

>>> mystr = "Hello Python.

Hello Java. Hello C++."

replace(old, new[,co Returns a string where a specified >>>

unt]) value is replaced with a specified value print(mystr.replace("Hello",

"Bye"))

Bye Python. Bye Java. Bye


C++.

>>>

print(mystr.replace("Hello",

"Hell", 2))

Hell Python. Hell Java.

Hello C++.

>>> mystr = "Hello-Python"

>>> print(mystr.rfind("P"))

6
Searches the string for a specified
rfind(sub[, start[,end
value and returns the last position of >>> print(mystr.rfind("-"))
]])
where it was found
5

>>> print(mystr.rfind("z"))

-1

>>> mystr = "Hello-Python"

>>> print(mystr.rindex("P"))

>>> print(mystr.rindex("-"))
Searches the string for a specified
rindex(sub[, start[,e
value and returns the last position of 5
nd]])
where it was found
>>> print(mystr.rindex("z"))

Traceback (most recent call

last):

File "<pyshell#253>", line


1, in <module>

print(mystr.rindex("z"))

ValueError: substring not

found

>>> mystr = "Hello Python"

>>> mystr1 = mystr.rjust(20,


rjust(width[,fillchar] Returns the string right justified in a
"-")
) string of length width.
>>> print(mystr1)

--------Hello Python

>>> mystr = "Hello Python"

>>>

print(mystr.rpartition("."))
Returns a tuple where the string is
rpartition(sep) ('', '', 'Hello Python')
parted into three parts
>>> print(mystr.rpartition("

"))

('Hello', ' ', 'Python')

>>> mystr = "Hello Python"

>>> print(mystr.rsplit())

rsplit(sep=None, Splits the string at the specified ['Hello', 'Python']

maxsplit=-1) separator, and returns a list >>> mystr = "Hello-Python-

Hello"

>>>
print(mystr.rsplit(sep="-",

maxsplit=1))

['Hello-Python', 'Hello']

>>> mystr = "Hello Python"

>>> print(mystr.rstrip(),

"!")

Hello Python !

>>> mystr = "------------

Hello Python-----------"

Returns a right trim version of the >>> print(mystr.rstrip(), "-


rstrip([chars])
string ")

------------Hello Python----

------- -

>>> print(mystr.rstrip(),

"_")

------------Hello Python----

------- _

>>> mystr = "Hello Python"

>>> print(mystr.split())
split(sep=None, Splits the string at the specified
['Hello', 'Python']
maxsplit=-1) separator, and returns a list
>>> mystr1="Hello,,Python"

>>> print(mystr1.split(","))
['Hello', '', 'Python']

>>> mystr = "Hello:\n\n

Python\r\nJava\nC++\n"

>>>

print(mystr.splitlines())

['Hello:', '', ' Python',

splitlines([keepends] Splits the string at line breaks and 'Java', 'C++']

) returns a list >>>

print(mystr.splitlines(keepe

nds=True))

['Hello:\n', '\n', '

Python\r\n', 'Java\n',

'C++\n']

>>> mystr = "Hello Python"

>>>

print(mystr.startswith("P"))

False
startswith(prefix[,sta Returns true if the string starts with the
>>>
rt[, end]]) specified value
print(mystr.startswith("H"))

True

>>>

print(mystr.startswith("Hell
"))

True

>>> mystr = "

Hello Python

"

>>> print(mystr.strip(),

strip([chars]) Returns a trimmed version of the string "!")

Hello Python !

>>> print(mystr.strip(), "

")

Hello Python

>>> mystr = "Hello PYthon"


Swaps cases, lower case becomes
swapcase() >>> print(mystr.swapcase())
upper case and vice versa
hELLO python

>>> mystr = "Hello PYthon"

>>> print(mystr.title())

Converts the first character of each Hello Python


title()
word to upper case >>> mystr = "HELLO JAVA"

>>> print(mystr.title())

Hello Java

>>> frm = "helloPython"


translate(table) Returns a translated string
>>> to = "40250666333"
>>> trans_table =

str.maketrans(frm, to)

>>> secret_code = "Secret

Code".translate(trans_table)

>>> print(secret_code)

S0cr06 C3d0

>>> mystr = "hello Python"

upper() Converts a string into upper case >>> print(mystr.upper())

HELLO PYTHON

>>> mystr = "999"

>>> print(mystr.zfill(9))

Fills the string with a specified number 000000999


zfill(width)
of 0 values at the beginning >>> mystr = "-40"

>>> print(mystr.zfill(5))

-0040

Python String Functions

Python provides us with a number of functions that we can apply on strings or to create
strings.

1. len()

The len() function returns the length of a string.

>>> a='book'

>>> len(a)

Output:
4

2. str()

This function converts any data type into a string.

>>> str(2+3j)

Output

‘(2+3j)’

count( ) function

The count() function finds the number of times a specified value(given by the user) appears
in the given string.

Syntax: string.count(value, start, end)

find( ) function

The find() function finds the first occurrence of the specified value. It returns -1 if the value
is not found in that string.

The find() function is almost the same as the index() function, but the only difference is that
the index() function raises an exception if the value is not found.

Syntax: string.find(value, start, end)

replace( ) function

The replace() function replaces a specified phrase with another specified phrase.

Note: All occurrences of the specified phrase will be replaced if nothing else is specified.

Syntax: string.replace(oldvalue, newvalue, count)

join( ) function

The join() function takes all items in an iterable and joins them into one string. We have to
specify a string as the separator.

Syntax: string.join(iterable)

swapcase( ) function
The swapcase() function returns a string where all the upper case letters are lower case and
vice versa.

Syntax: string.swapcase()

strip(): returns a new string after removing any leading and trailing whitespaces including
tabs (\t).

rstrip(): returns a new string with trailing whitespace removed. It’s easier to remember as
removing white spaces from “right” side of the string.

lstrip(): returns a new string with leading whitespace removed, or removing whitespaces
from the “left” side of the string.

Let’s look at a simple example of trimming whitespaces from the string in Python.

s1 = ' abc '

print(f'String =\'{s1}\'')

print(f'After Removing Leading Whitespaces String =\'{s1.lstrip()}\'')

print(f'After Removing Trailing Whitespaces String =\'{s1.rstrip()}\'')

print(f'After Trimming Whitespaces String =\'{s1.strip()}\'')

Output:

String =' abc '

After Removing Leading Whitespaces String ='abc '

After Removing Trailing Whitespaces String =' abc'

After Trimming Whitespaces String ='abc'

In python, the string format() method is useful to format complex strings and numbers. The
format strings will contain the curly braces { } and the format() method will use those curly
braces { } as placeholders to replace with the content of the parameters.

Following is the pictorial representation of string format() method functionality in python.


If you observe the above diagram, the format string (s) contains placeholders ({ }), and the
python format method inserted the required text in the place of placeholders and returned the
formatted string.

String Format Method Syntax

Following is the syntax of defining a string format method in python.

string.format(arg1, arg2, etc.)

The python format() method will accept unlimited arguments and replace the curly brace {
} placeholders with the content of the respective arguments.

String Format Method Example

Following is the example of formatting the strings using the format() method in python.

str1 = "{} to {}".format("Welcome", "Python")


print(str1)
name = "Suresh"
age = 33
str2 = "My name is {}, and I am {} years old"
print(str2.format(name, age))

If you observe the above example, we created strings with a curly brace { } placeholders and
we are replacing the placeholders with required argument values using format() method.

When you execute the above python program, you will get the result as shown below.

Welcome to Python
My name is Suresh, and I am 33 years old
If you observe the above example, we created strings with placeholders without specifying
any order, and we are not sure whether the arguments are placed in the correct order or not.

String Format with Positional/Keyword Arguments

To make sure the arguments are placed in the correct order, you can define the positional or
keyword arguments in the placeholders like as shown below.

# Positional arguments
str1 = "{1} to {0}".format("Python", "Welcome")
print(str1)
# Keyword arguments
name = "Suresh"
age = 33
str2 = "My name is {x}, and I am {y} years old"
print(str2.format(y = age, x = name))

If you observe the above example, we defined the placeholders with positional and keyword
arguments to specify the order.

When you execute the above python program, you will get the result as shown below.

Welcome to Python
My name is Suresh, and I am 33 years old

The Python splitlines() method is used to break the given string at line boundaries, for
example the \n(newline characters), or \r(carriage return), etc.

• This is an inbuilt method of string in Python.

• When we break the string, then different lines are compiled into a list and that is
returned by this function. So we can say it returns a list of splitted lines.

• Some fo the different types of line breaks are \n(newline character), \r(carriage
return), \r\n(carriage return+new line). We have a complete table specifying all the
characters which specify the line boundaries.

Python String splitlines(): Syntax

Below we have a basic syntax of String splitlines() method in Python:

string.splitlines([keepends])
Python String Slicing

To access a range of characters in a string, you need to slice a string. One way to do this is to
use the simple slicing operator :

With this operator you can specify where to start the slicing, where to end and specify the
step.

Slicing a String

If S is a string, the expression S [ start : stop : step ] returns the portion of the string from
index start to index stop, at a step size step.

Syntax

Basic Example

Here is a basic example of string slicing.

S = 'ABCDEFGHI'

print(S[2:7]) # CDEFG
Note that the item at index 7 'H' is not included.

Slice with Negative Indices

You can also specify negative indices while slicing a string.

S = 'ABCDEFGHI'

print(S[-7:-2]) # CDEFG

Slice with Positive & Negative Indices

You can specify both positive and negative indices at the same time.

S = 'ABCDEFGHI'

print(S[2:-5]) # CD

Specify Step of the Slicing


You can specify the step of the slicing using step parameter. The step parameter is optional
and by default 1.

# Return every 2nd item between position 2 to 7

S = 'ABCDEFGHI'

print(S[2:7:2]) # CEG

Negative Step Size

You can even specify a negative step size.

# Returns every 2nd item between position 6 to 1 in reverse order

S = 'ABCDEFGHI'

print(S[6:1:-2]) # GEC

Slice at Beginning & End

Omitting the start index starts the slice from the index 0. Meaning, S[:stop] is equivalent
to S[0:stop]

# Slice first three characters from the string

S = 'ABCDEFGHI'

print(S[:3]) # ABC

Whereas, omitting the stop index extends the slice to the end of the string.
Meaning, S[start:] is equivalent to S[start:len(S)]

# Slice last three characters from the string

S = 'ABCDEFGHI'
print(S[6:]) # GHI

String slicing can also accept a third parameter, the stride, which refers to how many
characters you want to move forward after the first character is retrieved from the string. The
value of stride is set to 1 by default.

Let's see stride in action to understand it better:

number_string = "1020304050"

print(number_string[0:-1:2])

OUTPUT:

12345

Python ord(), chr() functions

Python ord() and chr() are built-in functions. They are used to convert a character to an int
and vice versa. Python ord() and chr() functions are exactly opposite of each other.

Python ord()

Python ord() function takes string argument of a single Unicode character and return its
integer Unicode code point value. Let’s look at some examples of using ord() function.

x = ord('A')

print(x)

print(ord('ć'))

print(ord('ç'))

print(ord('$'))

Output:

65

263

231

36
Python chr()

Python chr() function takes integer argument and return the string representing a character at
that code point.

y = chr(65)

print(y)

print(chr(123))

print(chr(36))

Output:

The operators in and not in the test for membership. The X in S evaluates to True if X is a
member of S, and False otherwise. The X in S returns the negation of X in S. All built-in
sequences and set types to support this as well as a dictionary, for which in tests whether the
dictionary has a given key.

strA = "Game of Thrones was the best show on the planet"

print("Thrones" in strA)

output:

True

As you can see, an in operator returns the True value when the substring exists in the
string. Otherwise, it returns false.

strA = "Game of Thrones was the best show on the planet"

if "Thrones" in strA:

print('It exists')

else:
print('Does not exist')

output:

It exists

strA = "Game of Thrones was the best show on the planet"

if "Breking Bad" not in strA:

print('Breaking Bad is not exist in the String.')

else:

print('It exists.')

outpur:

Breaking Bad is not exist in the String.

Python String Comparison operators

In python language, we can compare two strings such as identify whether the two strings are
equivalent to each other or not, or even which string is greater or smaller than each other. Let
us check some of the string comparison operator used for this purpose below:

• ==: This operator checks whether two strings are equal.

• !=: This operator checks whether two strings are not equal.

• <: This operator checks whether the string on the left side is smaller than the string on
the right side.

• <=: This operator checks whether the string on the left side is smaller or equal to the
string on the right side.

• >: This operator checks whether the string on the left side is greater than the string on
the right side.

• >=: This operator checks whether the string on the left side is greater than the string
on the right side.

name = 'John'

name2 = 'john'
name3 = 'doe'

name4 = 'Doe'

print("Are name and name 1 equal?")

print (name == name2)

print("Are name and name3 different?")

print (name != name3)

print("Is name less than or equal to name2?")

print (name <= name2)

print("Is name3 greater than or equal to name 2?")

print (name3 >= name2)

print("Is name4 less than name?")

print (name4 < name)

Output

Are name and name 1 equal?

False

Are name and name3 different?

True

Is name less than or equal to name2?

True

Is name3 greater than or equal to name 2?

False

Is name4 less than name?

True

Iterating string:
You can loop through string variable in Python with for loop or while loop. Use direct string
to loop over string in Python. The method prints each letter of a string in a single line after
the loop. You have to use the print statement to print each letter one by one.

Iterate Through Python String Characters with For Loop

You can use for loop of Python to iterate through each element of the string. The iteration of
the loop depends upon the number of letters in the string variable. The loop also counts the
space to make a loop iterate in Python.

See the example below to perform a loop iteration using the for loop of Python.

myString = "Hello";

for str in myString:

print(str);

Output:

The above example showing the output of a python loop on the string. There is 5 letter in the
string without any space. The iteration also performs 5 times and print each letter in each line
in the output.

Each letter of the string gets printed in a single line. Check the above output about printed
string after the loop or iteration.

Using While Loop To Get Each Character

In addition to the above for loop, you can iterate through python string using while loop also.
To iterate through the while, you have to follow the below-given example.

The example requires a variable to initialize with zero(0) at first. After that, a while loop
start which iterates through the string using the python len() function.
myString = "Save me";

str=0;

while str < len(myString):

print(myString[str])

str += 1

Output:

The above example showing the iteration through the string and the output. The output
contains the single letter printed in every single line with space also.

The string Module

The string module contains a number of functions to process standard Python strings,

string.ascii_letters
The concatenation of the ascii_lowercase and ascii_uppercase constants described below.
This value is not locale-dependent.

string.ascii_lowercase

The lowercase letters 'abcdefghijklmnopqrstuvwxyz'. This value is not locale-dependent and


will not change.

string.ascii_uppercase

The uppercase letters 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'. This value is not locale-


dependent and will not change.

string.digits

The string '0123456789'.

string.hexdigits

The string '0123456789abcdefABCDEF'.

string.octdigits

The string '01234567'.

string.punctuation

String of ASCII characters which are considered punctuation characters in


the C locale: !"#$%&'()*+,-./:;<=>?@[\]^_`{|}~.

string.printable

String of ASCII characters which are considered printable. This is a combination


of digits, ascii_letters, punctuation, and whitespace.

string.whitespace

A string containing all ASCII characters that are considered whitespace. This includes the
characters space, tab, linefeed, return, formfeed, and vertical tab.

import string

text = "Monty Python's Flying Circus"

print "upper", "=>", string.upper(text)


print "lower", "=>", string.lower(text)

print "split", "=>", string.split(text)

print "join", "=>", string.join(string.split(text), "+")

print "replace", "=>", string.replace(text, "Python", "Java")

print "find", "=>", string.find(text, "Python"), string.find(text, "Java")

print "count", "=>", string.count(text, "n")

Output:

upper => MONTY PYTHON'S FLYING CIRCUS

lower => monty python's flying circus

split => ['Monty', "Python's", 'Flying', 'Circus']

join => Monty+Python's+Flying+Circus

replace => Monty Java's Flying Circus

find => 6 -1

count => 3

The dir() function returns all properties and methods of the specified object, without the
values.

This function will return all the properties and methods, even built-in properties which are
default for all object.

import string

dir(string)

['Formatter', 'Template', '_ChainMap', '_TemplateMetaclass', '__all__', '__builtins__',


'__cached__', '__doc__', '__file__', '__loader__', '__name__', '__package__', '__spec__', '_re',
'_sentinel_dict', '_string', 'ascii_letters', 'ascii_lowercase', 'ascii_uppercase', 'capwords',
'digits', 'hexdigits', 'octdigits', 'printable', 'punctuation', 'whitespace']

Regular Expressions in Python


A regular expression in python is a special module that helps filter sequence of
strings, characters and symbols using specialized syntax written in a particular pattern. Like
any other programming language, regular expression module ‘re’ in python allows you to
process unlimited amount of strings that passes regular expression requirements. If you want
to process large amount of text data with some conditions then regular expression is your best
bet. There are many other reasons to use regular expressions in your programs.

Why Regular Expressions?

• Search and Replace : It’s easy to extract specific strings from large documents. You
can use it as a search and replace feature to correct grammatical errors or to
add/remove strings in document. It is also possible to use the regular expressions for
the purpose search the occurrence of strings, as purely for search purpose.

• Splitting String: You can split the strings if occurrence of a character or symbol is
found. Likewise there are many ways to approach splitting your document as per
regex matches.

• Validation: You can validate your document for certain requirements. This is one
good feature if you’re testing some web based standards or company specific
standards in your code.

Metacharacters

Each character in a Python Regex is either a metacharacter or a regular character. A


metacharacter has a special meaning, while a regular character matches itself.

Python has the following metacharacters:

Metacharacter Description

^ Matches the start of the string

Matches a single character, except a newline


.
But when used inside square brackets, a dot is matched

A bracket expression matches a single character from the ones inside it


[]
[abc] matches ‘a’, ‘b’, and ‘c’
[a-z] matches characters from ‘a’ to ‘z’
[a-cx-z] matches ‘a’, ’b’, ’c’, ’x’, ’y’, and ‘z’

Matches a single character from those except the ones mentioned in the
[^ ]
brackets[^abc] matches all characters except ‘a’, ‘b’ and ‘c’

Parentheses define a marked subexpression, also called a block, or a


()
capturing group

\t, \n, \r, \f Tab, newline, return, form feed

Matches the preceding character zero or more times


ab*c matches ‘ac’, ‘abc’, ‘abbc’, and so on
*
[ab]* matches ‘’, ‘a’, ‘b’, ‘ab’, ‘ba’, ‘aba’, and so on
(ab)* matches ‘’, ‘ab’, ‘abab’, ‘ababab’, and so on

Matches the preceding character minimum m times, and maximum n


{m,n} times
a{2,4} matches ‘aa’, ‘aaa’, and ‘aaaa’

{m} Matches the preceding character exactly m times

Matches the preceding character zero or one times


?
ab?c matches ‘ac’ or ‘abc’

Matches the preceding character one or one times


+
ab+c matches ‘abc’, ‘abbc’, ‘abbbc’, and so on, but not ‘ac’

The choice operator matches either the expression before it, or the one
| after
abc|def matches ‘abc’ or ‘def’

Matches a word character (a-zA-Z0-9)


\w
\W matches single non-word characters
\b Matches the boundary between word and non-word characters

Matches a single whitespace character


\s
\S matches a single non-whitespace character

\d Matches a single decimal digit character (0-9)

A single backslash inhibits a character’s specialness


Examples- \. \\ \*
\
When unsure if a character has a special meaning, put a \ before it:
\@

$ A dollar matches the end of the string

RegEx functions

The re module provides users a variety of functions to search for a pattern in a particular
string. Below are some of the most frequently used functions in detail:

1. re.findall()

The re.findall() function returns a list of strings containing all matches of the specified
pattern.

The function takes as input the following:

• a character pattern

• the string from which to search

Example

The following example will return a list of all the instances of the substring at in the given
string:

import re

string = "at what time?"

match = re.findall('at',string)
print (match)

Output

['at', 'at']

2. re.search()

The re.search() function returns a match object in case a match is found.

Note:

• In case of more than one match, the first occurrence of the match is returned.

• If no occurrence is found, None is returned.

Example

Suppose you wish to look for the occurrence of a particular sub-string in a string:

import re

string = "at what time?"

match = re.search('at',string)

if (match):

print "String found at: " ,match.start()

else:

print "String not found!"

Output

String found at: 0

import re

string = "at what time?"

match = re.search('ti',string)

if (match):

print "String found at: " ,match.start()


else:

print "String not found!"

Output

String not found!

Note: The start() function returns the start index of the matched string.

3. re.split()

The re.split() function splits the string at every occurrence of the sub-string and returns a list
of strings which have been split.

Consider the following example to get a better idea of what this function does:

Example

Suppose we wish to split a string wherever there is an occurrence of a

import re

string = "at what time?"

match = re.split('a',string)

print (match)

Output

['', 't wh', 't time?']

Note: In case there is no match, the string will be returned as it is, in a list.

4. re.sub()

The re.sub() function is used to replace occurrences of a particular sub-string with another
sub-string.

This function takes as input the following:

1. The sub-string to replace

2. The sub-string to replace with

3. The actual string


Example

Suppose you wish to insert !!! instead of a white-space character in a string. This can be
done via the re.sub() function as follows:

import re

string = "at what time?"

match = re.sub("\s","!!!",string)

print (match)

Output

at!!!what!!!time?

Python Regex Find All Matches using findall() and finditer()

The RE module’s re.findall() method scans the regex pattern through the entire target string
and returns all the matches that were found in the form of a list.

the syntax of the re.findall() method.

Syntax:

re.findall(pattern, string, flags=0)

1. pattern: regular expression pattern we want to find in the string or text

2. string: It is the variable pointing to the target string (In which we want to look for
occurrences of the pattern).

3. Flags: It refers to optional regex flags. by default, no flags are applied. For example,
the re.I flag is used for performing case-insensitive findings.

The regular expression pattern and target string are the mandatory arguments, and flags are
optional.

Return Value

The re.findall() scans the target string from left to right as per the regular expression pattern
and returns all matches in the order they were found.
It returns None if it fails to locate the occurrences of the pattern or such a pattern doesn’t
exist in a target string.

import re

target_string = "Emma is a basketball player who was born on June 17, 1993. She played 112
matches with scoring average 26.12 points per game. Her weight is 51 kg."

result = re.findall(r"\d+", target_string)

# print all matches

print("Found following matches")

print(result)

Output

Found following matches

['17', '1993', '112', '26', '12', '51']

Finditer method

The re.finditer() works exactly the same as the re.findall() method except it returns an
iterator yielding match objects matching the regex pattern in a string instead of a list.

It scans the string from left to right, and matches are returned in the iterator form. Later, we
can use this iterator object to extract all matches.

In simple words, finditer() returns an iterator over MatchObject objects.

import re

target_string = "Emma is a basketball player who was born on June 17, 1993. She played 112
matches with a scoring average of 26.12 points per game. Her weight is 51 kg."

# finditer() with regex pattern and target string

# \d{2} to match two consecutive digits

result = re.finditer(r"\d{2}", target_string)

# print all match object

for match_obj in result:


# print each re.Match object

print(match_obj)

# extract each matching number

print(match_obj.group())

Output

<re.Match object; span=(49, 51), match='17'>

17

<re.Match object; span=(53, 55), match='19'>

19

<re.Match object; span=(55, 57), match='93'>

93

<re.Match object; span=(70, 72), match='11'>

11

<re.Match object; span=(108, 110), match='26'>

26

<re.Match object; span=(111, 113), match='12'>

12

<re.Match object; span=(145, 147), match='51'>

51

Character class

A "character class", or a "character set", is a set of characters put in square brackets.


The regex engine matches only one out of several characters in the character class or
character set. We place the characters we want to match between square brackets. If you want
to match any vowel, we use the character set [aeiou].A character class or set matches only a
single character. The order of the characters inside a character class or set does not matter.
The results are identical.
We use a hyphen inside a character class to specify a range of characters. [0-9] matches a
single digit between 0 and 9. Similarly for uppercase and lowercase letters we have the
character class [A-Za-z]

Example

The following code finds and prints all the vowels in the given string

import re

s = 'mother of all battles'

result = re.findall(r'[aeiou]', s)

print result

Output

This gives the output

['o', 'e', 'o', 'a', 'a', 'e']

Python Regex Flags

Python regex allows optional flags to specify when using regular expression patterns
with match(), search(), and split(), among others.

All RE module methods accept an optional flags argument that enables various unique
features and syntax variations.

For example, you want to search a word inside a string using regex. You can enhance this
regex’s capability by adding the RE.I flag as an argument to the search method to enable
case-insensitive searching.

You will learn how to use all regex flags available in Python with short and clear examples.

First, refer to the below table for available regex flags.

Flag long syntax Meaning

re.A re.ASCII Perform ASCII-only matching instead of full Unicode matching


Flag long syntax Meaning

re.I re.IGNORECASE Perform case-insensitive matching

This flag is used with metacharacter ^ (caret) and $ (dollar).


When this flag is specified, the metacharacter ^ matches the
pattern at beginning of the string and each newline’s beginning
re.M re.MULTILINE
(\n).
And the metacharacter $ matches pattern at the end of the string
and the end of each new line (\n)

Make the DOT (.) special character match any character at all,
re.S re.DOTALL including a newline. Without this flag, DOT(.) will match anything
except a newline

Allow comment in the regex. This flag is useful to make regex


re.X re.VERBOSE
more readable by allowing comments in the regex.

Perform case-insensitive matching dependent on the current locale.


re.L re.LOCALE
Use only with bytes patterns

To specify more than one flag, use the | operator to connect them. For example, case
insensitive searches in a multiline string

re.findall(pattern, string, flags=re.I|re.M|re.X)

groups() method

The group feature of regular expression allows you to pick up parts of the matching
text. Parts of a regular expression pattern bounded by parenthesis () are called groups. The
parenthesis does not change what the expression matches, but rather forms groups within the
matched sequence.

A group() expression returns one or more subgroups of the match.


Code
>>> import re

>>> m = re.match(r'(\w+)@(\w+)\.(\w+)','username@hackerrank.com')

>>> m.group(0) # The entire match

'username@hackerrank.com'

>>> m.group(1) # The first parenthesized subgroup.

'username'

>>> m.group(2) # The second parenthesized subgroup.

'hackerrank'

>>> m.group(3) # The third parenthesized subgroup.

'com'

>>> m.group(1,2,3) # Multiple arguments give us a tuple.

('username', 'hackerrank', 'com')

groups()

A groups() expression returns a tuple containing all the subgroups of the match.
Code

>>> import re

>>> m = re.match(r'(\w+)@(\w+)\.(\w+)','username@hackerrank.com')

>>> m.groups()

('username', 'hackerrank', 'com')

File Handling

Python provides us with an important feature for reading data from the file and
writing data into a file. Mostly, in programming languages, all the values or data are stored in
some variables which are volatile in nature. Because data will be stored into those variables
during run-time only and will be lost once the program execution is completed. Hence it is
better to save these data permanently using files.
If you are working in a large software application where they process a large number
of data, then we cannot expect those data to be stored in a variable as the variables are
volatile in nature. Hence when are you about to handle such situations, the role of files will
come into the picture. As files are non-volatile in nature, the data will be stored permanently
in a secondary device like Hard Disk and using python we will handle these files in our
applications.

Example of how normal people will handle the files. If we want to read the data from
a file or write the data into a file, then, first of all, we will open the file or will create a new
file if the file does not exist and then perform the normal read/write operations, save the file
and close it. Similarly, we do the same operations in python using some in-built methods or
functions.

File Paths

A file has two key properties: a filename (usually written as one word) and a path.
The path specifies the location of a file on the computer. For example, there is a file on my
Windows 7 laptop with the filename project.docx in the path C:\Users\asweigart\Documents.
The part of the filename after the last period is called the file’s extension and tells you a file’s
type. project.docx is a Word document, and Users, asweigart, and Documents all refer
to folders (also called directories). Folders can contain files and other folders. For
example, project.docx is in the Documents folder, which is inside the asweigart folder, which
is inside the Users folder. Figure shows this folder organization.

Figure . A file in a hierarchy of folders

NOTE: Backslash on Windows and Forward Slash on OS X and Linux


Absolute vs. Relative Paths

There are two ways to specify a file path.

• An absolute path, which always begins with the root folder

• A relative path, which is relative to the program’s current working directory

There are also the dot (.) and dot-dot (..) folders. These are not real folders but special names
that can be used in a path. A single period (“dot”) for a folder name is shorthand for “this
directory.” Two periods (“dot-dot”) means “the parent folder.”

Figure is an example of some folders and files. When the current working directory is set
to C:\bacon, the relative paths for the other folders and files are set as they are in the figure.

Figure . The relative paths for folders and files in the working directory C:\bacon

The .\ at the start of a relative path is optional. For example, .\spam.txt and spam.txt refer to
the same file.

Types of Files

Computers store every file as a collection of 0s and 1s i.e., in binary form. Therefore,
every file is basically just a series of bytes stored one after the other. There are mainly two
types of data files — text file and binary file.

Text file
A text file can be understood as a sequence of characters consisting of alphabets,
numbers and other special symbols. Files with extensions like .txt, .py, .csv, etc. are some
examples of text files. When we open a text file using a text editor (e.g., Notepad), we see
several lines of text. However, the file contents are not stored in such a way internally.
Rather, they are stored in sequence of bytes consisting of 0s and 1s. In ASCII, UNICODE or
any other encoding scheme, the value of each character of the text file is stored as bytes. So,
while opening a text file, the text editor translates each ASCII value and shows us the
equivalent character that is readable by the human being. For example, the ASCII value 65
(binary equivalent 1000001) will be displayed by a text editor as the letter ‘A’ since the
number 65 in ASCII character set represents ‘A’. Each line of a text file is terminated by a
special character, called the End of Line (EOL). For example, the default EOL character in
Python is the newline (\n). However, other characters can be used to indicate EOL. When a
text editor or a program interpreter encounters the ASCII equivalent of the EOL character, it
displays the remaining file contents starting from a new line. Contents in a text file are
usually separated by whitespace, but comma (,) and tab (\t) are also commonly used to
separate values in a text file.

Binary Files

Binary files are also stored in terms of bytes (0s and 1s), but unlike text files, these
bytes do not represent the ASCII values of characters. Rather, they represent the actual
content such as image, audio, video, compressed versions of other files, executable files, etc.
These files are not human readable. Thus, trying to open a binary file using a text editor will
show some garbage values. We need specific software to read or write the contents of a
binary file. Binary files are stored in a computer in a sequence of bytes. Even a single bit
change can corrupt the file and make it unreadable to the supporting application. Also, it is
difficult to remove any error which may occur in the binary file as the stored contents are not
human readable. We can read and write both text and binary files through Python programs.

Opening and Closing a Text File

In real world applications, computer programs deal with data coming from different
sources like databases, CSV files, HTML, XML, JSON, etc. We broadly access files either to
write or read data from it. But operations on files include creating and opening a file, writing
data in a file, traversing a file, reading data from a file and so on. Python has the io module
that contains different functions for handling files.
Opening a file

To open a file in Python, we use the open() function. The syntax of open() is as
follows: file_object= open(file_name, access_mode) This function returns a file object called
file handle which is stored in the variable file_object. We can use this variable to transfer data
to and from the file (read and write) by calling the functions defined in the Python’s io
module. If the file does not exist, the above statement creates a new empty file and assigns it
the name we specify in the statement.

The access_mode is an optional argument that represents the mode in which the file
has to be accessed by the program. It is also referred to as processing mode. Here mode
means the operation for which the file has to be opened like for reading, for writing, for both
reading and writing, for appending at the end of an existing file. The default is the read mode.
In addition, we can specify whether the file will be handled as binary () or text mode. By
default, files are opened in text mode that means strings can be read or written. Files
containing non-textual data are opened in binary mode that means read/write are performed
in terms of bytes.

Access mode determines the mode in which the file has to be opened ie. read, write append
etc. *given below

Modes Description

1. Opens a file for reading only.

R 2. The file pointer is placed at the beginning of the file.

3. This is the default mode.

1. Opens a file for reading only in binary format.

Rb 2. The file pointer is placed at the beginning of the file.

3. This is the default mode.

1. Opens a file for both reading and writing.


r+
2. The file pointer will be at the beginning of the file.

rb+ 1. Opens a file for both reading and writing in binary format.
2. The file pointer will be at the beginning of the file.

1. Opens a file for writing only.

W 2. Overwrites the file if the file exists.

3. If the file does not exist, creates a new file for writing.

1. Opens a file for writing only in binary format.

Wb 2. Overwrites the file if the file exists.

3. If the file does not exist, creates a new file for writing.

1. Opens a file for both writing and reading.

w+ 2. Overwrites the existing file if the file exists.

3. If the file does not exist, creates a new file for reading and writing.

1. Opens a file for both writing and reading in binary format.

wb+ 2. Overwrites the existing file if the file exists.

3. If the file does not exist, creates a new file for reading and writing.

1. Opens a file for appending.

2. The file pointer is at the end of the file if the file exists. That is, the file is in the
A
append mode.

3.If the file does not exist, it creates a new file for writing.

1. Opens a file for appending in binary format.

2. The file pointer is at the end of the file if the file exists. That is, the file is in the
Ab
append mode.

3. If the file does not exist, it creates a new file for writing.

a+ 1. Opens a file for both appending and reading.


2. The file pointer is at the end of the file if the file exists. The file opens in the
append mode.

3.If the file does not exist, it creates a new file for reading and writing.

1. Opens a file for both appending and reading in binary format.

2.The file pointer is at the end of the file if the file exists. The file opens in the
ab+
append mode.

3. If the file does not exist, it creates a new file for reading and writing.

The file object attributes:

The file_object has certain attributes that tells us basic information about the file, such as:

• <file.closed> returns true if the file is closed and false otherwise.

• <file.mode> returns the access mode in which the file was opened.

• <file.name> returns the name of the file.

The file_name should be the name of the file that has to be opened. If the file is not in
the current working directory, then we need to specify the complete path of the file along
with its name.

Consider the following example.

myObject=open(“myfile.txt”, “a+”)

In the above statement, the file myfile.txt is opened in append and read modes. The
file object will be at the end of the file. That means we can write data at the end of the file
and at the same time we can also read data from the file using the file object named
myObject.

Closing a file

Once we are done with the read/write operations on a file, it is a good practice to close
the file. Python provides a close() method to do so. While closing a file, the system frees the
memory allocated to it. The syntax of close() is:

file_object.close()
Here, file_object is the object that was returned while opening the file. Python makes
sure that any unwritten or unsaved data is flushed off (written) to the file before it is closed.
Hence, it is always advised to close the file once our work is done. Also, if the file object is
re-assigned to some other file, the previous file is automatically closed.

Writing to a Text File

For writing to a file, we first need to open it in write or append mode. If we open an
existing file in write mode, the previous data will be erased, and the file object will be
positioned at the beginning of the file. On the other hand, in append mode, new data will be
added at the end of the previous data as the file object is at the end of the file. After opening
the file, we can use the following methods to write data in the file.

• write() - for writing a single string

• writeline() - for writing a sequence of strings

The write() method


write() method takes a string as an argument and writes it to the text file. It returns the
number of characters being written on single execution of the write() method. Also, we need
to add a newline character (\n) at the end of every sentence to mark the end of line.

Consider the following piece of code:

>>> myobject=open("myfile.txt",'w')

>>> myobject.write("Hey I have started

#using files in Python\n")

41

>>> myobject.close()

On execution, write() returns the number of characters written on to the file. Hence, 41,
which is the length of the string passed as an argument, is displayed.

Note: ‘\n’ is treated as a single character

The write() actually writes data onto a buffer. When the close() method is executed, the
contents from this buffer are moved to the file located on the permanent storage.

The writelines() method

This method is used to write multiple strings to a file. We need to pass an iterable
object like lists, tuple, etc. containing strings to the writelines() method. Unlike write(), the
writelines() method does not return the number of characters written in the file. The
following code explains the use of writelines().

>>> myobject=open("myfile.txt",'w')

>>> lines = ["Hello everyone\n", "Writing

#multiline strings\n", "This is the

#third line"]

>>> myobject.writelines(lines)

>>>myobject.close()

On opening myfile.txt, using notepad, its content will appear as shown in Figure
Reading from a Text File

We can write a program to read the contents of a file. Before reading a file, we
must make sure that the file is opened in “r”, “r+”, “w+” or “a+” mode. There are three
ways to read the contents of a file:

The read() method

This method is used to read a specified number of bytes of data from a data file. The syntax of
read() method is:

file_object.read(n)

Consider the following set of statements to understand the usage of read() method:

>>>myobject=open("myfile.txt",'r')

>>> myobject.read(10)

'Hello ever'

>>> myobject.close()

If no argument or a negative number is specified in read(), the entire file content is read.
For example,

>>> myobject=open("myfile.txt",'r')

>>> print(myobject.read())

Hello everyone

Writing multiline strings

This is the third line


>>> myobject.close()

The readline([n]) method


This method reads one complete line from a file where each line terminates with a
newline (\n) character. It can also be used to read a specified number (n) of bytes of data from
a file but maximum up to the newline character (\n). In the following example, the second
statement reads the first ten characters of the first line of the text file and displays them on the
screen.

>>> myobject=open("myfile.txt",'r')

>>> myobject.readline(10)

'Hello ever'

>>> myobject.close()

If no argument or a negative number is specified, it reads a complete line and returns string.

>>>myobject=open("myfile.txt",'r')

>>> print (myobject.readline())

'Hello everyone\n'
To read the entire file line by line using the readline(), we can use a loop. This process is
known as looping/ iterating over a file object. It returns an empty string when EOF is
reached.

The readlines() method

The method reads all the lines and returns the lines along with newline as a list of
strings. The following example uses readlines() to read data from the text file myfile.txt.

>>> myobject=open("myfile.txt", 'r')

>>> print(myobject.readlines())

['Hello everyone\n', 'Writing multiline strings\n', 'This is the third line']

>>> myobject.close()

As shown in the above output, when we read a file using readlines() function, lines in the file
become members of a list, where each list element ends with a newline character (‘\n’).

In case we want to display each word of a line separately as an element of a list, then we can
use split() function. The following code demonstrates the use of split() function.

>>> myobject=open("myfile.txt",'r')
>>> d=myobject.readlines()
>>> for line in d:

words=line.split()

print(words)

['Hello', 'everyone']

['Writing', 'multiline', 'strings']

['This', 'is', 'the', 'third', 'line']

In the output, each string is returned as elements of a list. However, if splitlines() is used
instead of split(), then each line is returned as element of a list, as shown in the output below:

>>> for line in d:

words=line.splitlines()

print(words)

['Hello everyone']

['Writing multiline strings']

['This is the third line']

Let us now write a program that accepts a string from the user and writes it to a text file.
Thereafter, the same program reads the text file and displays it on the screen.
Writing and reading to a text file

fobject=open("testfile.txt","w") # creating a data file


sentence=input("Enter the contents to be written in the file: ")
fobject.write(sentence) # Writing data to the file
fobject.close() # Closing a file
print("Now reading the contents of the file: ")
fobject=open("testfile.txt","r")
#looping over the file object to read the file
for str in fobject:
print(str)
fobject.close()
In Program 2.1, the file named testfile.txt is opened in write mode and the file handle named
fobject is returned. The string is accepted from the user and written in the file using write().
Then the file is closed and again opened in read mode. Data is read from the file and
displayed till the end of file is reached.

Output of Program 2-1:


>>>
RESTART: Path_to_file\Program2-1.py
Enter the contents to be written in the file: roll_numbers = [1, 2, 3, 4, 5, 6]
Now reading the contents of the file:
roll_numbers = [1, 2, 3, 4, 5, 6]
>>>

Opening a file using with clause

In Python, we can also open a file using with clause. The syntax of with clause is:

with open (file_name, access_mode) as file_ object:

The advantage of using with clause is that any file that is opened using this clause is closed
automatically, once the control comes outside the with clause. In case the user forgets to close
the file explicitly or if an exception occurs, the file is closed automatically. Also, it provides a
simpler syntax.

with open(“myfile.txt”,”r+”) as myObject:

content = myObject.read()
Here, we don’t have to close the file explicitly using close() statement. Python will
automatically close the file.

Splitting words:
Other useful methods:

Sr.No. Methods with Description

1 file.close()

Close the file. A closed file cannot be read or written any more.

2 file.flush()

Flush the internal buffer, like stdio's fflush. This may be a no-op on some file-like objects.

3 file.fileno()

Returns the integer file descriptor that is used by the underlying implementation to request
I/O operations from the operating system.

4 file.isatty()
Returns True if the file is connected to a tty(-like) device, else False.

5 file.next()

Returns the next line from the file each time it is being called.

6 file.read([size])

Reads at most size bytes from the file (less if the read hits EOF before obtaining size
bytes).

7 file.readline([size])

Reads one entire line from the file. A trailing newline character is kept in the string.

8 file.readlines([sizehint])

Reads until EOF using readline() and return a list containing the lines. If the optional
sizehint argument is present, instead of reading up to EOF, whole lines totalling
approximately sizehint bytes (possibly after rounding up to an internal buffer size) are
read.

9 file.seek(offset[, whence])

Sets the file's current position

10 file.tell()

Returns the file's current position

11 file.truncate([size])

Truncates the file's size. If the optional size argument is present, the file is truncated to (at
most) that size.
12 file.write(str)

Writes a string to the file. There is no return value.

13 file.writelines(sequence)

Writes a sequence of strings to the file. The sequence can be any iterable object producing
strings, typically a list of strings.

File Positions

A file handle or pointer denotes the position from which the file contents will be read or
written. File handle is also called as file pointer or cursor.

For example, when you open a file in write mode, the file pointer is placed at the 0th position,
i.e., at the start of the file. However, it changes (increments) its position as you started writing
content into it. Or, when you read a file line by line, the file pointer moves one line at a time.

While the file’s access mode implies the type of operation that we intend to perform in the
file, it also determines the file handle position. For example, if the file is opened in reading,
the file handle will be in the beginning, and after reading the entire file, it will be in the last
character, which is the End of the File.

We can get the file handle current position using the tell() method.

Syntax:

file_object.tell()

There are no arguments for this method. The return value is the integer representing the file
handle position.

To change the file handle’s position use seek() method. As we discussed, the seek() method
sets the file’s current position, and then we can read or write to the file from that position.

Syntax:

f.seek(offset, whence)

How many points the pointer will move is computed from adding offset to a reference
point; the reference point is given by the whence argument.
The allowed values for the whence argument are: –

• A whence value of 0 means from the beginning of the file.

• A whence value of 1 uses the current file position

• A whence value of 2 uses the end of the file as the reference point.

The default value for the whence is the beginning of the file, which is 0

Seek
Meaning
Operation

f.seek(0) Move file pointer to the beginning of a File

f.seek(5) Move file pointer five characters ahead from the beginning of a file.

f.seek(0, 2) Move file pointer to the end of a File

f.seek(5, 1) Move file pointer five characters ahead from the current position.

f.seek(-5, 1) Move file pointer five characters behind from the current position.

Move file pointer in the reverse direction. Move it to the 5th character from the end
f.seek(-5, 2)
of the file
Renaming and Deleting Files

Python os module provides methods that help you perform file-processing operations, such as
renaming and deleting files.

To use this module you need to import it first and then you can call any related functions.

The rename() Method

The rename() method takes two arguments, the current filename and the new filename.

Syntax

os.rename(current_file_name, new_file_name)

Example

Following is the example to rename an existing file test1.txt:

#!/usr/bin/python

import os

# Rename a file from test1.txt to test2.txt

os.rename( “test1.txt”, “test2.txt” )


The remove() Method

You can use the remove() method to delete files by supplying the name of the file
to be deleted as the argument.

Syntax

os.remove(file_name)

Example

Following is the example to delete an existing file test2.txt:

#!/usr/bin/python

import os

# Delete file test2.txt

os.remove(“text2.txt”)

Directory methods

All files are contained within various directories, and Python has no problem
handling these too. The os module has several methods that help you create,
remove, and change directories.

The mkdir() Method

You can use the mkdir() method of the os module to create directories in the
current directory.

You need to supply an argument to this method which contains the name of the directory to
be created.

Syntax

os.mkdir(“newdir”)

Example

Following is the example to create a directory test in the current directory:

#!/usr/bin/python

import os
# Create a directory

“test” os.mkdir(“test”)

The chdir() Method

You can use the chdir() method to change the current directory. The chdir() method takes an
argument, which is the name of the directory that you want to make the current directory.

Syntax

os.chdir(“newdir”)

Example

Following is the example to go into “/home/newdir” directory:

#!/usr/bin/python

import os

# Changing a directory to “/home/newdir”

os.chdir(“/home/newdir”)

The getcwd() Method

The getcwd() method displays the current working directory.

Syntax

os.getcwd()

Example

Following is the example to give current directory:

#!/usr/bin/python

import os

# This would give location of the current

directory os.getcwd()

The rmdir() Method


The rmdir() method deletes the directory, which is passed as an argument in the
method.

Before removing a directory, all the contents in it should be removed.

Syntax

os.rmdir(‘dirname’)

Example

Following is the example to remove “/tmp/test” directory. It is required to give fully qualified
name of the directory, otherwise it would search for that directory in the current directory.

#!/usr/bin/python

import os

# This would remove “/tmp/test” directory.

os.rmdir( “/tmp/test” )

It can also raise errors in following scenarios,

• If directory is not empty then it will cause OSError i.e.

OSError: [WinError 145] The directory is not empty:

• If given directory path is not pointing to a directory, then this error will be raised,

NotADirectoryError: [WinError 267] The directory name is invalid:

• If there is no directory at given path then this error will be raised,

FileNotFoundError: [WinError 2] The system cannot find the file specified

If you want to delete the directory even if its not empty then shutil.rmtree Deletes a
folder permanantly and its entire contents

import shutil

shutil.rmtree('F:\\Example2')
These functions deletes the files/directories without sending to the recycle bin/trash. So these
functions should be used really carefully.

os.makedirs() method
The os module has in-built os.makedirs() method to create nested or recursive
directories within the system.

That is, the os.makedirs() function creates the parent directory, the intermediate directories as
well as the leaf directory if any of them is not present in the system files.

Syntax:

os.makedirs(path,mode)

Example:

import os

main_dir = "C:/Examples/Python_files/OS_module"

os.makedirs(main_dir,mode = 0o666)

print("Directory '% s' is built!" % main_dir)

In the above example, the makedirs() function creates the intermediate directories –
‘Python_files’ as well as the leaf directory – ‘OS_module’ in one shot through the function.

Output:

Directory 'C:/Examples/Python_files/OS_module' is built!

The os.path.join() function

We're going to be dealing a lot with files stored on our computers and, by necessity, we will
be dealing with their paths, i.e. the text strings that describe the next of subdirectories to
actually get to a specific file. But this is complicated when dealing with different operating
systems.

On OS X – and, all other Unix-based systems such as Linux, file paths are represented as
text strings in this format, with forward-slashes delimiting the subdirectories and the actual
filename – in this case, file.txt:

my/path/to/file.txt

In Windows, the backslash is used:

\my\path\to\file.txt
If you've been paying attention to what the backslash character means for Python strings,
you might remember that it acts as an escape sequence – i.e. the backslash modifies the
meaning of the token (i.e. character) that follows it. This means to print a literal backslash in
a Python string, you have to use double backslashes:

>>> print("\\my\\path\\to\\file.txt")

\my\path\to\file.txt

As you can imagine, that could complicate the ability to write code that works on Windows
and everywhere else.

We get access to the os.path module if we have import os in our code.

The os.path.join() function takes as many arguments needed to generate a specified file path,
with each argument representing one component (i.e. subdirectory) of the path. So instead of
doing this:

mypath = "my/path/to/file.txt"

We do this:

mypath = os.path.join('my', 'path', 'to', 'file.txt')

And whether you're running code on Windows or Unix-based systems, the actual path to the
file will be consistent.

Methods from the os module:

os.listdir() method in Python is used to get the list of all files and directories in the specified
directory. If we don’t specify any directory, then the list of files and directories in the current
working directory will be returned.

# Python program to explain os.listdir() method

# importing os module

import os

# Get the list of all files and directories

# in the root directory


path = "/"

dir_list = os.listdir(path)

print("Files and directories in '", path, "' :")

# print the list

print(dir_list)

Output:

Files and directories in ' / ' :

['sys', 'run', 'tmp', 'boot', 'mnt', 'dev', 'proc', 'var', 'bin', 'lib64', 'usr',

'lib', 'srv', 'home', 'etc', 'opt', 'sbin', 'media']

os.path.getsize(): In this method, python will give us the size of the file in bytes. To use this
method we need to pass the name of the file as a parameter.

import os #importing os module

size = os.path.getsize("filename")

print("Size of the file is", size," bytes.")

Output:

Size of the file is 192 bytes.

os.path.exists(): This method will check whether a file exists or not by passing the name of
the file as a parameter. OS module has a sub-module named PATH by using which we can
perform many more functions.

import os

#importing os module

result = os.path.exists("file_name") #giving the name of the file as a parameter


print(result)

Output

False

As in the above code, the file does not exist it will give output False. If the file exists it will
give us output True.

os.path.basename(path) : It is used to return the basename of the file . This function


basically return the file name from the path given.

# basename function

import os

out = os.path.basename("/baz/foo")

print(out)

Output:

'foo'

os.path.dirname(path) : It is used to return the directory name from the path given. This
function returns the name from the path except the path name.

# dirname function

import os

out = os.path.dirname("/baz/foo")

print(out)

Output:

'/baz'

os.path.isabs(path) : It specifies whether the path is absolute or not. In Unix system absolute
path means path begins with the slash(‘/’) and in Windows that it begins with a (back)slash
after chopping off a potential drive letter.
#isabs function

import os

out = os.path.isabs("/baz/foo")

print(out)

Output:

True

os.path.isdir(path) : This function specifies whether the path is existing directory or not.

# isdir function

import os

out = os.path.isdir("C:\\Users")

print(out)

Output:

True

os.path.isfile(path) : This function specifies whether the path is existing file or not.

# isfile function

import os

out = os.path.isfile("C:\\Users\foo.csv")

print(out)

Output:

True

os.path.split() method in Python is used to Split the path name into a pair head and tail.
Here, tail is the last path name component and head is everything leading up to that.
# Python program to explain os.path.split() method

# importing os module

import os

# path

path = '/home/User/Desktop/file.txt'

# Split the path in

# head and tail pair

head_tail = os.path.split(path)

# print head and tail

# of the specified path

print("Head of '% s:'" % path, head_tail[0])

print("Tail of '% s:'" % path, head_tail[1], "\n")

# path

path = '/home/User/Desktop/'

# Split the path in

# head and tail pair

head_tail = os.path.split(path)

# print head and tail

# of the specified path

print("Head of '% s:'" % path, head_tail[0])

print("Tail of '% s:'" % path, head_tail[1], "\n")

# path

path = 'file.txt'
# Split the path in

# head and tail pair

head_tail = os.path.split(path)

# print head and tail

# of the specified path

print("Head of '% s:'" % path, head_tail[0])

print("Tail of '% s:'" % path, head_tail[1])

Output:

Head of '/home/User/Desktop/file.txt': /home/User/Desktop

Tail of '/home/User/Desktop/file.txt': file.txt

Head of '/home/User/Desktop/': /home/User/Desktop

Tail of '/home/User/Desktop/':

Head of 'file.txt':

Tail of 'file.txt': file.txt

You might also like