Professional Documents
Culture Documents
INTRODUCTION:
Example:
String=’python’
Enclosing string in double quotes
Example:
Example:
INDEXING:
In Python, strings are ordered sequences of character data, and thus Individual characters in a
string can be accessed by specifying the string name followed by a number in square brackets
([]). The number in the brackets is called as index. String indexing in Python is zero-based:
the first character in the string has index 0, the next has index 1, and so on. The index of the
last character will be the length of the string minus one.
String indices can also be specified with negative numbers, in which case indexing
occurs from the end of the string backward: -1 refers to the last character, -2 the second-to-
last character, and so on.
Example:
sampleStr = "Hello, this is a sample string"
output:
Character at index 4 is : o
Traversing a string:
Traversing a string means accessing all the elements of the string one after the other by using
the subscript. A string can be traversed using for loop or while loop. simple way to traverse
the string is by using Python range function. This method lets us access string elements using
the index.
"""
Python Program:
"""
string_to_iterate = "PYTHON"
print(string_to_iterate[char_index]
Output:
Y
T
Appending strings refers to adding (+=) one or more strings to the end of another string.
In some cases, these terms are absolutely interchangeable. Appending strings is the same
operation as concatenating strings at the end of another.
Use the multiplication operator * to repeat a string multiple times
Multiply a string with the multiplication operator * by an integer n to concatenate the string
with itself n times. Call print(value) with the resultant string as value to print it.
Printing a string multiple times on the same line outputs multiple copies of the string without
newlines.
Strings are not mutable in Python. Strings are a immutable data types which means that its
value cannot be updated. An immutable object is one that, once created, will not change in its
lifetime. Example Try the following code to execute:
name_1 = "Varun"
name_1[0] = 'T'
Output
You will get an error message when you want to change the content of the string.
One possible solution is to create a new string object with necessary modifications:
name_1 = "Varun"
Output
To identify that they are different strings, check with the id() function:
name_1 = "Varun"
Output
id of name_1 = 140225710197088
id of name_2 = 140225710197704
To understand more about the concept of string immutability, consider the following code:
name_1 = "Varun"
name_2 = "Varun"
print("id of name_1 = ", id(name_1))
Output
id of name_1 = 140685486749936
id of name_2 = 140685486749936
When the above lines of code are executed, you will find that the id’s of both name_1 and
name_2 objects, which refer to the string “Varun”, are the same.
name_1 = "Varun"
name_1 = "Tarun"
Output
id of name_1 = 140449996065064
As can be seen in the above example, when a string reference is reinitialized with a new
value, it is creating a new object rather than overwriting the previous value.
In Python, strings are made immutable so that programmers cannot alter the contents of the
object (even by mistake). This avoids unnecessary bugs.
Python uses C-style string formatting to create new, formatted strings. The ”%”
operator is used to format a set of variables enclosed in a “tuple” (a fixed size list), together
with a format string, which contains normal text together with “argument specifiers”, special
symbols like "%s" and "%d".
my_string = "InterviewBit"
qty = 10
item_name = "chocolate"
rs = 100
Any object which is not a string can be formatted using the %s operator as well. The string
which returns from the “repr” method of that object is formatted as the string. For example:
my_list = [1, 2, 3]
%d - Integers
%.f - Floating point numbers with a fixed amount of digits to the right of the dot.
The format() method formats the specified value(s) and insert them inside the string’s
placeholder.
The placeholder is defined using curly brackets: {}.
The format() method returns the formatted string.
The placeholders can be identified using named indexes {price}, numbered indexes {0}, or
even empty placeholders {}.
qty = 10
item_name = "chocolate"
rs = 100
#named indexes
#numbered indexes:
#empty placeholders:
Everything in Python is an object. A string is an object too. Python provides us with various
methods to call on the string object. Note that none of these methods alters the actual string.
They instead return a copy of the string.This copy is the manipulated version of the string.
>>> print(mystr.count("o"))
>>> print(mystr.count("th"))
1
Returns the number of non-overlapping
>>> print(mystr.count("l"))
occurrences of substring (sub) in the
Count(sub, [start],
range [start, end]. Optional 2
[end])
arguments startand end are interpreted
>>> print(mystr.count("h"))
as in slice notation.
1
>>> print(mystr.count("H"))
>>> print(mystr.count("hH"))
• xmlcharrefreplace b'python!'
"replace"))
>>>
print(mystr.endswith("y"))
Returns True if the string ends with the
endswith(suffix,
specified suffix, otherwise it returns False
[start], [end])
False.
>>>
print(mystr.endswith("hon"))
True
>>> print(mystr)
Returns a copy of the string where all
Expandtabs(tabsize= tab characters are replaced by one or 123
123
>>>
print(mystr.expandtabs(tabsi
ze=15))
12
>>>
print(mystr.expandtabs(tabsi
ze=2))
123
>>>
print(mystring.find("P"))
Returns the lowest index in the string
Find(sub, [start],
where substring sub is found within the 0
[end])
slice s[start:end].
>>>
print(mystring.find("on"))
{0}".format("Apple",
"Banana"))
{dinner}".format(lunch="Peas
", dinner="Beans"))
{Drink}".format_map(lunch))
def __missing__(self,
Similar to format(**mapping), except
format_map(mappin
that mapping is used directly and not key):
g)
copied to a dictionary.
return key
"Wine"}
{Drink}".format_map(Default(
lunch)))
>>> print(mystr.index("P"))
5
Searches the string for a specified
Index(sub, [start], >>>
value and returns the position of where
[end]) print(mystr.index("hon"))
it was found
8
>>> print(mystr.index("o"))
>>> print(mystr.isalnum())
True
>>> a = "123"
Returns True if all characters in the
isalnum >>> print(a.isalnum())
string are alphanumeric
True
>>> a= "$*%!!!"
>>> print(a.isalnum())
False
>>> print(mystr.isalpha())
Returns True if all characters in the
Isalpha() True
string are in the alphabet
>>> a = "123"
>>> print(a.isalpha())
False
>>> a= "$*%!!!"
>>> print(a.isalpha())
False
>>> print(mystr.isdecimal())
False
>>> a="1.23"
>>> print(a.isdecimal())
>>> print(c.isdecimal())
False
>>> c="133"
>>> print(c.isdecimal())
True
>>> c="133"
>>> print(c.isdigit())
>>> print(c.isdigit())
True
>>> a="1.23"
>>> print(a.isdigit())
False
>>> c="133"
>>> print(c.isidentifier())
False
>>> c="_user_123"
Returns True if the string is an
isidentifier() >>> print(c.isidentifier())
identifier
True
>>> c="Python"
>>> print(c.isidentifier())
True
>>> c="Python"
>>> print(c.islower())
False
True
>>> print(c.islower())
False
>>> c="_user_123"
>>> print(c.isnumeric())
False
>>> c="Python"
>>> print(c.isnumeric())
False
>>> c="133"
>>> print(c.isprintable())
True
>>> c="_user_123"
Returns True if all characters in the
isprintable() >>> print(c.isprintable())
string are printable
True
>>> c="\t"
>>> print(c.isprintable())
False
>>> c="133"
>>> print(c.isspace())
>>> print(c.isspace())
False
73
>>> c="Hello"
>>> print(c.isspace())
False
>>> c="\t"
>>> print(c.isspace())
True
>>> c="133"
>>> print(c.istitle())
False
>>> c="Python"
Returns True if the string follows the
istitle() >>> print(c.istitle())
rules of a title
True
>>> c="\t"
>>> print(c.istitle())
False
>>> c="Python"
>>> print(c.isupper())
>>> print(c.isupper())
True
>>> c="\t"
>>> print(c.isupper())
False
>>> a ="-"
>>> print(a.join("123"))
1-2-3
Python"))
H**e**l**l**o**
**P**y**t**h**o**n
>>> a="Hello"
Hello_______
>>> a = "Python"
Python
lstrip([chars]) Returns a left trim version of the string >>> print(a.lstrip(), "!")
Hello
>>> frm = "SecretCode"
>>> to = "4203040540"
>>> trans_table =
Code".translate(trans_table)
>>> print(sec_code)
400304 0540
>>> print(mystr.partition("-
"))
>>>
print(mystr.partition("."))
"Bye"))
>>>
print(mystr.replace("Hello",
"Hell", 2))
Hello C++.
>>> print(mystr.rfind("P"))
6
Searches the string for a specified
rfind(sub[, start[,end
value and returns the last position of >>> print(mystr.rfind("-"))
]])
where it was found
5
>>> print(mystr.rfind("z"))
-1
>>> print(mystr.rindex("P"))
>>> print(mystr.rindex("-"))
Searches the string for a specified
rindex(sub[, start[,e
value and returns the last position of 5
nd]])
where it was found
>>> print(mystr.rindex("z"))
last):
print(mystr.rindex("z"))
found
--------Hello Python
>>>
print(mystr.rpartition("."))
Returns a tuple where the string is
rpartition(sep) ('', '', 'Hello Python')
parted into three parts
>>> print(mystr.rpartition("
"))
>>> print(mystr.rsplit())
Hello"
>>>
print(mystr.rsplit(sep="-",
maxsplit=1))
['Hello-Python', 'Hello']
>>> print(mystr.rstrip(),
"!")
Hello Python !
Hello Python-----------"
------------Hello Python----
------- -
>>> print(mystr.rstrip(),
"_")
------------Hello Python----
------- _
>>> print(mystr.split())
split(sep=None, Splits the string at the specified
['Hello', 'Python']
maxsplit=-1) separator, and returns a list
>>> mystr1="Hello,,Python"
>>> print(mystr1.split(","))
['Hello', '', 'Python']
Python\r\nJava\nC++\n"
>>>
print(mystr.splitlines())
print(mystr.splitlines(keepe
nds=True))
Python\r\n', 'Java\n',
'C++\n']
>>>
print(mystr.startswith("P"))
False
startswith(prefix[,sta Returns true if the string starts with the
>>>
rt[, end]]) specified value
print(mystr.startswith("H"))
True
>>>
print(mystr.startswith("Hell
"))
True
Hello Python
"
>>> print(mystr.strip(),
Hello Python !
")
Hello Python
>>> print(mystr.title())
>>> print(mystr.title())
Hello Java
str.maketrans(frm, to)
Code".translate(trans_table)
>>> print(secret_code)
S0cr06 C3d0
HELLO PYTHON
>>> print(mystr.zfill(9))
>>> print(mystr.zfill(5))
-0040
Python provides us with a number of functions that we can apply on strings or to create
strings.
1. len()
>>> a='book'
>>> len(a)
Output:
4
2. str()
>>> str(2+3j)
Output
‘(2+3j)’
count( ) function
The count() function finds the number of times a specified value(given by the user) appears
in the given string.
find( ) function
The find() function finds the first occurrence of the specified value. It returns -1 if the value
is not found in that string.
The find() function is almost the same as the index() function, but the only difference is that
the index() function raises an exception if the value is not found.
replace( ) function
The replace() function replaces a specified phrase with another specified phrase.
Note: All occurrences of the specified phrase will be replaced if nothing else is specified.
join( ) function
The join() function takes all items in an iterable and joins them into one string. We have to
specify a string as the separator.
Syntax: string.join(iterable)
swapcase( ) function
The swapcase() function returns a string where all the upper case letters are lower case and
vice versa.
Syntax: string.swapcase()
strip(): returns a new string after removing any leading and trailing whitespaces including
tabs (\t).
rstrip(): returns a new string with trailing whitespace removed. It’s easier to remember as
removing white spaces from “right” side of the string.
lstrip(): returns a new string with leading whitespace removed, or removing whitespaces
from the “left” side of the string.
Let’s look at a simple example of trimming whitespaces from the string in Python.
print(f'String =\'{s1}\'')
Output:
In python, the string format() method is useful to format complex strings and numbers. The
format strings will contain the curly braces { } and the format() method will use those curly
braces { } as placeholders to replace with the content of the parameters.
The python format() method will accept unlimited arguments and replace the curly brace {
} placeholders with the content of the respective arguments.
Following is the example of formatting the strings using the format() method in python.
If you observe the above example, we created strings with a curly brace { } placeholders and
we are replacing the placeholders with required argument values using format() method.
When you execute the above python program, you will get the result as shown below.
Welcome to Python
My name is Suresh, and I am 33 years old
If you observe the above example, we created strings with placeholders without specifying
any order, and we are not sure whether the arguments are placed in the correct order or not.
To make sure the arguments are placed in the correct order, you can define the positional or
keyword arguments in the placeholders like as shown below.
# Positional arguments
str1 = "{1} to {0}".format("Python", "Welcome")
print(str1)
# Keyword arguments
name = "Suresh"
age = 33
str2 = "My name is {x}, and I am {y} years old"
print(str2.format(y = age, x = name))
If you observe the above example, we defined the placeholders with positional and keyword
arguments to specify the order.
When you execute the above python program, you will get the result as shown below.
Welcome to Python
My name is Suresh, and I am 33 years old
The Python splitlines() method is used to break the given string at line boundaries, for
example the \n(newline characters), or \r(carriage return), etc.
• When we break the string, then different lines are compiled into a list and that is
returned by this function. So we can say it returns a list of splitted lines.
• Some fo the different types of line breaks are \n(newline character), \r(carriage
return), \r\n(carriage return+new line). We have a complete table specifying all the
characters which specify the line boundaries.
string.splitlines([keepends])
Python String Slicing
To access a range of characters in a string, you need to slice a string. One way to do this is to
use the simple slicing operator :
With this operator you can specify where to start the slicing, where to end and specify the
step.
Slicing a String
If S is a string, the expression S [ start : stop : step ] returns the portion of the string from
index start to index stop, at a step size step.
Syntax
Basic Example
S = 'ABCDEFGHI'
print(S[2:7]) # CDEFG
Note that the item at index 7 'H' is not included.
S = 'ABCDEFGHI'
print(S[-7:-2]) # CDEFG
You can specify both positive and negative indices at the same time.
S = 'ABCDEFGHI'
print(S[2:-5]) # CD
S = 'ABCDEFGHI'
print(S[2:7:2]) # CEG
S = 'ABCDEFGHI'
print(S[6:1:-2]) # GEC
Omitting the start index starts the slice from the index 0. Meaning, S[:stop] is equivalent
to S[0:stop]
S = 'ABCDEFGHI'
print(S[:3]) # ABC
Whereas, omitting the stop index extends the slice to the end of the string.
Meaning, S[start:] is equivalent to S[start:len(S)]
S = 'ABCDEFGHI'
print(S[6:]) # GHI
String slicing can also accept a third parameter, the stride, which refers to how many
characters you want to move forward after the first character is retrieved from the string. The
value of stride is set to 1 by default.
number_string = "1020304050"
print(number_string[0:-1:2])
OUTPUT:
12345
Python ord() and chr() are built-in functions. They are used to convert a character to an int
and vice versa. Python ord() and chr() functions are exactly opposite of each other.
Python ord()
Python ord() function takes string argument of a single Unicode character and return its
integer Unicode code point value. Let’s look at some examples of using ord() function.
x = ord('A')
print(x)
print(ord('ć'))
print(ord('ç'))
print(ord('$'))
Output:
65
263
231
36
Python chr()
Python chr() function takes integer argument and return the string representing a character at
that code point.
y = chr(65)
print(y)
print(chr(123))
print(chr(36))
Output:
The operators in and not in the test for membership. The X in S evaluates to True if X is a
member of S, and False otherwise. The X in S returns the negation of X in S. All built-in
sequences and set types to support this as well as a dictionary, for which in tests whether the
dictionary has a given key.
print("Thrones" in strA)
output:
True
As you can see, an in operator returns the True value when the substring exists in the
string. Otherwise, it returns false.
if "Thrones" in strA:
print('It exists')
else:
print('Does not exist')
output:
It exists
else:
print('It exists.')
outpur:
In python language, we can compare two strings such as identify whether the two strings are
equivalent to each other or not, or even which string is greater or smaller than each other. Let
us check some of the string comparison operator used for this purpose below:
• !=: This operator checks whether two strings are not equal.
• <: This operator checks whether the string on the left side is smaller than the string on
the right side.
• <=: This operator checks whether the string on the left side is smaller or equal to the
string on the right side.
• >: This operator checks whether the string on the left side is greater than the string on
the right side.
• >=: This operator checks whether the string on the left side is greater than the string
on the right side.
name = 'John'
name2 = 'john'
name3 = 'doe'
name4 = 'Doe'
Output
False
True
True
False
True
Iterating string:
You can loop through string variable in Python with for loop or while loop. Use direct string
to loop over string in Python. The method prints each letter of a string in a single line after
the loop. You have to use the print statement to print each letter one by one.
You can use for loop of Python to iterate through each element of the string. The iteration of
the loop depends upon the number of letters in the string variable. The loop also counts the
space to make a loop iterate in Python.
See the example below to perform a loop iteration using the for loop of Python.
myString = "Hello";
print(str);
Output:
The above example showing the output of a python loop on the string. There is 5 letter in the
string without any space. The iteration also performs 5 times and print each letter in each line
in the output.
Each letter of the string gets printed in a single line. Check the above output about printed
string after the loop or iteration.
In addition to the above for loop, you can iterate through python string using while loop also.
To iterate through the while, you have to follow the below-given example.
The example requires a variable to initialize with zero(0) at first. After that, a while loop
start which iterates through the string using the python len() function.
myString = "Save me";
str=0;
print(myString[str])
str += 1
Output:
The above example showing the iteration through the string and the output. The output
contains the single letter printed in every single line with space also.
The string module contains a number of functions to process standard Python strings,
string.ascii_letters
The concatenation of the ascii_lowercase and ascii_uppercase constants described below.
This value is not locale-dependent.
string.ascii_lowercase
string.ascii_uppercase
string.digits
string.hexdigits
string.octdigits
string.punctuation
string.printable
string.whitespace
A string containing all ASCII characters that are considered whitespace. This includes the
characters space, tab, linefeed, return, formfeed, and vertical tab.
import string
Output:
find => 6 -1
count => 3
The dir() function returns all properties and methods of the specified object, without the
values.
This function will return all the properties and methods, even built-in properties which are
default for all object.
import string
dir(string)
• Search and Replace : It’s easy to extract specific strings from large documents. You
can use it as a search and replace feature to correct grammatical errors or to
add/remove strings in document. It is also possible to use the regular expressions for
the purpose search the occurrence of strings, as purely for search purpose.
• Splitting String: You can split the strings if occurrence of a character or symbol is
found. Likewise there are many ways to approach splitting your document as per
regex matches.
• Validation: You can validate your document for certain requirements. This is one
good feature if you’re testing some web based standards or company specific
standards in your code.
Metacharacters
Metacharacter Description
Matches a single character from those except the ones mentioned in the
[^ ]
brackets[^abc] matches all characters except ‘a’, ‘b’ and ‘c’
The choice operator matches either the expression before it, or the one
| after
abc|def matches ‘abc’ or ‘def’
RegEx functions
The re module provides users a variety of functions to search for a pattern in a particular
string. Below are some of the most frequently used functions in detail:
1. re.findall()
The re.findall() function returns a list of strings containing all matches of the specified
pattern.
• a character pattern
Example
The following example will return a list of all the instances of the substring at in the given
string:
import re
match = re.findall('at',string)
print (match)
Output
['at', 'at']
2. re.search()
Note:
• In case of more than one match, the first occurrence of the match is returned.
Example
Suppose you wish to look for the occurrence of a particular sub-string in a string:
import re
match = re.search('at',string)
if (match):
else:
Output
import re
match = re.search('ti',string)
if (match):
Output
Note: The start() function returns the start index of the matched string.
3. re.split()
The re.split() function splits the string at every occurrence of the sub-string and returns a list
of strings which have been split.
Consider the following example to get a better idea of what this function does:
Example
import re
match = re.split('a',string)
print (match)
Output
Note: In case there is no match, the string will be returned as it is, in a list.
4. re.sub()
The re.sub() function is used to replace occurrences of a particular sub-string with another
sub-string.
Suppose you wish to insert !!! instead of a white-space character in a string. This can be
done via the re.sub() function as follows:
import re
match = re.sub("\s","!!!",string)
print (match)
Output
at!!!what!!!time?
The RE module’s re.findall() method scans the regex pattern through the entire target string
and returns all the matches that were found in the form of a list.
Syntax:
2. string: It is the variable pointing to the target string (In which we want to look for
occurrences of the pattern).
3. Flags: It refers to optional regex flags. by default, no flags are applied. For example,
the re.I flag is used for performing case-insensitive findings.
The regular expression pattern and target string are the mandatory arguments, and flags are
optional.
Return Value
The re.findall() scans the target string from left to right as per the regular expression pattern
and returns all matches in the order they were found.
It returns None if it fails to locate the occurrences of the pattern or such a pattern doesn’t
exist in a target string.
import re
target_string = "Emma is a basketball player who was born on June 17, 1993. She played 112
matches with scoring average 26.12 points per game. Her weight is 51 kg."
print(result)
Output
Finditer method
The re.finditer() works exactly the same as the re.findall() method except it returns an
iterator yielding match objects matching the regex pattern in a string instead of a list.
It scans the string from left to right, and matches are returned in the iterator form. Later, we
can use this iterator object to extract all matches.
import re
target_string = "Emma is a basketball player who was born on June 17, 1993. She played 112
matches with a scoring average of 26.12 points per game. Her weight is 51 kg."
print(match_obj)
print(match_obj.group())
Output
17
19
93
11
26
12
51
Character class
Example
The following code finds and prints all the vowels in the given string
import re
result = re.findall(r'[aeiou]', s)
print result
Output
Python regex allows optional flags to specify when using regular expression patterns
with match(), search(), and split(), among others.
All RE module methods accept an optional flags argument that enables various unique
features and syntax variations.
For example, you want to search a word inside a string using regex. You can enhance this
regex’s capability by adding the RE.I flag as an argument to the search method to enable
case-insensitive searching.
You will learn how to use all regex flags available in Python with short and clear examples.
Make the DOT (.) special character match any character at all,
re.S re.DOTALL including a newline. Without this flag, DOT(.) will match anything
except a newline
To specify more than one flag, use the | operator to connect them. For example, case
insensitive searches in a multiline string
groups() method
The group feature of regular expression allows you to pick up parts of the matching
text. Parts of a regular expression pattern bounded by parenthesis () are called groups. The
parenthesis does not change what the expression matches, but rather forms groups within the
matched sequence.
>>> m = re.match(r'(\w+)@(\w+)\.(\w+)','username@hackerrank.com')
'username@hackerrank.com'
'username'
'hackerrank'
'com'
groups()
A groups() expression returns a tuple containing all the subgroups of the match.
Code
>>> import re
>>> m = re.match(r'(\w+)@(\w+)\.(\w+)','username@hackerrank.com')
>>> m.groups()
File Handling
Python provides us with an important feature for reading data from the file and
writing data into a file. Mostly, in programming languages, all the values or data are stored in
some variables which are volatile in nature. Because data will be stored into those variables
during run-time only and will be lost once the program execution is completed. Hence it is
better to save these data permanently using files.
If you are working in a large software application where they process a large number
of data, then we cannot expect those data to be stored in a variable as the variables are
volatile in nature. Hence when are you about to handle such situations, the role of files will
come into the picture. As files are non-volatile in nature, the data will be stored permanently
in a secondary device like Hard Disk and using python we will handle these files in our
applications.
Example of how normal people will handle the files. If we want to read the data from
a file or write the data into a file, then, first of all, we will open the file or will create a new
file if the file does not exist and then perform the normal read/write operations, save the file
and close it. Similarly, we do the same operations in python using some in-built methods or
functions.
File Paths
A file has two key properties: a filename (usually written as one word) and a path.
The path specifies the location of a file on the computer. For example, there is a file on my
Windows 7 laptop with the filename project.docx in the path C:\Users\asweigart\Documents.
The part of the filename after the last period is called the file’s extension and tells you a file’s
type. project.docx is a Word document, and Users, asweigart, and Documents all refer
to folders (also called directories). Folders can contain files and other folders. For
example, project.docx is in the Documents folder, which is inside the asweigart folder, which
is inside the Users folder. Figure shows this folder organization.
There are also the dot (.) and dot-dot (..) folders. These are not real folders but special names
that can be used in a path. A single period (“dot”) for a folder name is shorthand for “this
directory.” Two periods (“dot-dot”) means “the parent folder.”
Figure is an example of some folders and files. When the current working directory is set
to C:\bacon, the relative paths for the other folders and files are set as they are in the figure.
Figure . The relative paths for folders and files in the working directory C:\bacon
The .\ at the start of a relative path is optional. For example, .\spam.txt and spam.txt refer to
the same file.
Types of Files
Computers store every file as a collection of 0s and 1s i.e., in binary form. Therefore,
every file is basically just a series of bytes stored one after the other. There are mainly two
types of data files — text file and binary file.
Text file
A text file can be understood as a sequence of characters consisting of alphabets,
numbers and other special symbols. Files with extensions like .txt, .py, .csv, etc. are some
examples of text files. When we open a text file using a text editor (e.g., Notepad), we see
several lines of text. However, the file contents are not stored in such a way internally.
Rather, they are stored in sequence of bytes consisting of 0s and 1s. In ASCII, UNICODE or
any other encoding scheme, the value of each character of the text file is stored as bytes. So,
while opening a text file, the text editor translates each ASCII value and shows us the
equivalent character that is readable by the human being. For example, the ASCII value 65
(binary equivalent 1000001) will be displayed by a text editor as the letter ‘A’ since the
number 65 in ASCII character set represents ‘A’. Each line of a text file is terminated by a
special character, called the End of Line (EOL). For example, the default EOL character in
Python is the newline (\n). However, other characters can be used to indicate EOL. When a
text editor or a program interpreter encounters the ASCII equivalent of the EOL character, it
displays the remaining file contents starting from a new line. Contents in a text file are
usually separated by whitespace, but comma (,) and tab (\t) are also commonly used to
separate values in a text file.
Binary Files
Binary files are also stored in terms of bytes (0s and 1s), but unlike text files, these
bytes do not represent the ASCII values of characters. Rather, they represent the actual
content such as image, audio, video, compressed versions of other files, executable files, etc.
These files are not human readable. Thus, trying to open a binary file using a text editor will
show some garbage values. We need specific software to read or write the contents of a
binary file. Binary files are stored in a computer in a sequence of bytes. Even a single bit
change can corrupt the file and make it unreadable to the supporting application. Also, it is
difficult to remove any error which may occur in the binary file as the stored contents are not
human readable. We can read and write both text and binary files through Python programs.
In real world applications, computer programs deal with data coming from different
sources like databases, CSV files, HTML, XML, JSON, etc. We broadly access files either to
write or read data from it. But operations on files include creating and opening a file, writing
data in a file, traversing a file, reading data from a file and so on. Python has the io module
that contains different functions for handling files.
Opening a file
To open a file in Python, we use the open() function. The syntax of open() is as
follows: file_object= open(file_name, access_mode) This function returns a file object called
file handle which is stored in the variable file_object. We can use this variable to transfer data
to and from the file (read and write) by calling the functions defined in the Python’s io
module. If the file does not exist, the above statement creates a new empty file and assigns it
the name we specify in the statement.
The access_mode is an optional argument that represents the mode in which the file
has to be accessed by the program. It is also referred to as processing mode. Here mode
means the operation for which the file has to be opened like for reading, for writing, for both
reading and writing, for appending at the end of an existing file. The default is the read mode.
In addition, we can specify whether the file will be handled as binary () or text mode. By
default, files are opened in text mode that means strings can be read or written. Files
containing non-textual data are opened in binary mode that means read/write are performed
in terms of bytes.
Access mode determines the mode in which the file has to be opened ie. read, write append
etc. *given below
Modes Description
rb+ 1. Opens a file for both reading and writing in binary format.
2. The file pointer will be at the beginning of the file.
3. If the file does not exist, creates a new file for writing.
3. If the file does not exist, creates a new file for writing.
3. If the file does not exist, creates a new file for reading and writing.
3. If the file does not exist, creates a new file for reading and writing.
2. The file pointer is at the end of the file if the file exists. That is, the file is in the
A
append mode.
3.If the file does not exist, it creates a new file for writing.
2. The file pointer is at the end of the file if the file exists. That is, the file is in the
Ab
append mode.
3. If the file does not exist, it creates a new file for writing.
3.If the file does not exist, it creates a new file for reading and writing.
2.The file pointer is at the end of the file if the file exists. The file opens in the
ab+
append mode.
3. If the file does not exist, it creates a new file for reading and writing.
The file_object has certain attributes that tells us basic information about the file, such as:
• <file.mode> returns the access mode in which the file was opened.
The file_name should be the name of the file that has to be opened. If the file is not in
the current working directory, then we need to specify the complete path of the file along
with its name.
myObject=open(“myfile.txt”, “a+”)
In the above statement, the file myfile.txt is opened in append and read modes. The
file object will be at the end of the file. That means we can write data at the end of the file
and at the same time we can also read data from the file using the file object named
myObject.
Closing a file
Once we are done with the read/write operations on a file, it is a good practice to close
the file. Python provides a close() method to do so. While closing a file, the system frees the
memory allocated to it. The syntax of close() is:
file_object.close()
Here, file_object is the object that was returned while opening the file. Python makes
sure that any unwritten or unsaved data is flushed off (written) to the file before it is closed.
Hence, it is always advised to close the file once our work is done. Also, if the file object is
re-assigned to some other file, the previous file is automatically closed.
For writing to a file, we first need to open it in write or append mode. If we open an
existing file in write mode, the previous data will be erased, and the file object will be
positioned at the beginning of the file. On the other hand, in append mode, new data will be
added at the end of the previous data as the file object is at the end of the file. After opening
the file, we can use the following methods to write data in the file.
>>> myobject=open("myfile.txt",'w')
41
>>> myobject.close()
On execution, write() returns the number of characters written on to the file. Hence, 41,
which is the length of the string passed as an argument, is displayed.
The write() actually writes data onto a buffer. When the close() method is executed, the
contents from this buffer are moved to the file located on the permanent storage.
This method is used to write multiple strings to a file. We need to pass an iterable
object like lists, tuple, etc. containing strings to the writelines() method. Unlike write(), the
writelines() method does not return the number of characters written in the file. The
following code explains the use of writelines().
>>> myobject=open("myfile.txt",'w')
#third line"]
>>> myobject.writelines(lines)
>>>myobject.close()
On opening myfile.txt, using notepad, its content will appear as shown in Figure
Reading from a Text File
We can write a program to read the contents of a file. Before reading a file, we
must make sure that the file is opened in “r”, “r+”, “w+” or “a+” mode. There are three
ways to read the contents of a file:
This method is used to read a specified number of bytes of data from a data file. The syntax of
read() method is:
file_object.read(n)
Consider the following set of statements to understand the usage of read() method:
>>>myobject=open("myfile.txt",'r')
>>> myobject.read(10)
'Hello ever'
>>> myobject.close()
If no argument or a negative number is specified in read(), the entire file content is read.
For example,
>>> myobject=open("myfile.txt",'r')
>>> print(myobject.read())
Hello everyone
>>> myobject=open("myfile.txt",'r')
>>> myobject.readline(10)
'Hello ever'
>>> myobject.close()
If no argument or a negative number is specified, it reads a complete line and returns string.
>>>myobject=open("myfile.txt",'r')
'Hello everyone\n'
To read the entire file line by line using the readline(), we can use a loop. This process is
known as looping/ iterating over a file object. It returns an empty string when EOF is
reached.
The method reads all the lines and returns the lines along with newline as a list of
strings. The following example uses readlines() to read data from the text file myfile.txt.
>>> print(myobject.readlines())
>>> myobject.close()
As shown in the above output, when we read a file using readlines() function, lines in the file
become members of a list, where each list element ends with a newline character (‘\n’).
In case we want to display each word of a line separately as an element of a list, then we can
use split() function. The following code demonstrates the use of split() function.
>>> myobject=open("myfile.txt",'r')
>>> d=myobject.readlines()
>>> for line in d:
words=line.split()
print(words)
['Hello', 'everyone']
In the output, each string is returned as elements of a list. However, if splitlines() is used
instead of split(), then each line is returned as element of a list, as shown in the output below:
words=line.splitlines()
print(words)
['Hello everyone']
Let us now write a program that accepts a string from the user and writes it to a text file.
Thereafter, the same program reads the text file and displays it on the screen.
Writing and reading to a text file
In Python, we can also open a file using with clause. The syntax of with clause is:
The advantage of using with clause is that any file that is opened using this clause is closed
automatically, once the control comes outside the with clause. In case the user forgets to close
the file explicitly or if an exception occurs, the file is closed automatically. Also, it provides a
simpler syntax.
content = myObject.read()
Here, we don’t have to close the file explicitly using close() statement. Python will
automatically close the file.
Splitting words:
Other useful methods:
1 file.close()
Close the file. A closed file cannot be read or written any more.
2 file.flush()
Flush the internal buffer, like stdio's fflush. This may be a no-op on some file-like objects.
3 file.fileno()
Returns the integer file descriptor that is used by the underlying implementation to request
I/O operations from the operating system.
4 file.isatty()
Returns True if the file is connected to a tty(-like) device, else False.
5 file.next()
Returns the next line from the file each time it is being called.
6 file.read([size])
Reads at most size bytes from the file (less if the read hits EOF before obtaining size
bytes).
7 file.readline([size])
Reads one entire line from the file. A trailing newline character is kept in the string.
8 file.readlines([sizehint])
Reads until EOF using readline() and return a list containing the lines. If the optional
sizehint argument is present, instead of reading up to EOF, whole lines totalling
approximately sizehint bytes (possibly after rounding up to an internal buffer size) are
read.
9 file.seek(offset[, whence])
10 file.tell()
11 file.truncate([size])
Truncates the file's size. If the optional size argument is present, the file is truncated to (at
most) that size.
12 file.write(str)
13 file.writelines(sequence)
Writes a sequence of strings to the file. The sequence can be any iterable object producing
strings, typically a list of strings.
File Positions
A file handle or pointer denotes the position from which the file contents will be read or
written. File handle is also called as file pointer or cursor.
For example, when you open a file in write mode, the file pointer is placed at the 0th position,
i.e., at the start of the file. However, it changes (increments) its position as you started writing
content into it. Or, when you read a file line by line, the file pointer moves one line at a time.
While the file’s access mode implies the type of operation that we intend to perform in the
file, it also determines the file handle position. For example, if the file is opened in reading,
the file handle will be in the beginning, and after reading the entire file, it will be in the last
character, which is the End of the File.
We can get the file handle current position using the tell() method.
Syntax:
file_object.tell()
There are no arguments for this method. The return value is the integer representing the file
handle position.
To change the file handle’s position use seek() method. As we discussed, the seek() method
sets the file’s current position, and then we can read or write to the file from that position.
Syntax:
f.seek(offset, whence)
How many points the pointer will move is computed from adding offset to a reference
point; the reference point is given by the whence argument.
The allowed values for the whence argument are: –
• A whence value of 2 uses the end of the file as the reference point.
The default value for the whence is the beginning of the file, which is 0
Seek
Meaning
Operation
f.seek(5) Move file pointer five characters ahead from the beginning of a file.
f.seek(5, 1) Move file pointer five characters ahead from the current position.
f.seek(-5, 1) Move file pointer five characters behind from the current position.
Move file pointer in the reverse direction. Move it to the 5th character from the end
f.seek(-5, 2)
of the file
Renaming and Deleting Files
Python os module provides methods that help you perform file-processing operations, such as
renaming and deleting files.
To use this module you need to import it first and then you can call any related functions.
The rename() method takes two arguments, the current filename and the new filename.
Syntax
os.rename(current_file_name, new_file_name)
Example
#!/usr/bin/python
import os
You can use the remove() method to delete files by supplying the name of the file
to be deleted as the argument.
Syntax
os.remove(file_name)
Example
#!/usr/bin/python
import os
os.remove(“text2.txt”)
Directory methods
All files are contained within various directories, and Python has no problem
handling these too. The os module has several methods that help you create,
remove, and change directories.
You can use the mkdir() method of the os module to create directories in the
current directory.
You need to supply an argument to this method which contains the name of the directory to
be created.
Syntax
os.mkdir(“newdir”)
Example
#!/usr/bin/python
import os
# Create a directory
“test” os.mkdir(“test”)
You can use the chdir() method to change the current directory. The chdir() method takes an
argument, which is the name of the directory that you want to make the current directory.
Syntax
os.chdir(“newdir”)
Example
#!/usr/bin/python
import os
os.chdir(“/home/newdir”)
Syntax
os.getcwd()
Example
#!/usr/bin/python
import os
directory os.getcwd()
Syntax
os.rmdir(‘dirname’)
Example
Following is the example to remove “/tmp/test” directory. It is required to give fully qualified
name of the directory, otherwise it would search for that directory in the current directory.
#!/usr/bin/python
import os
os.rmdir( “/tmp/test” )
• If given directory path is not pointing to a directory, then this error will be raised,
If you want to delete the directory even if its not empty then shutil.rmtree Deletes a
folder permanantly and its entire contents
import shutil
shutil.rmtree('F:\\Example2')
These functions deletes the files/directories without sending to the recycle bin/trash. So these
functions should be used really carefully.
os.makedirs() method
The os module has in-built os.makedirs() method to create nested or recursive
directories within the system.
That is, the os.makedirs() function creates the parent directory, the intermediate directories as
well as the leaf directory if any of them is not present in the system files.
Syntax:
os.makedirs(path,mode)
Example:
import os
main_dir = "C:/Examples/Python_files/OS_module"
os.makedirs(main_dir,mode = 0o666)
In the above example, the makedirs() function creates the intermediate directories –
‘Python_files’ as well as the leaf directory – ‘OS_module’ in one shot through the function.
Output:
We're going to be dealing a lot with files stored on our computers and, by necessity, we will
be dealing with their paths, i.e. the text strings that describe the next of subdirectories to
actually get to a specific file. But this is complicated when dealing with different operating
systems.
On OS X – and, all other Unix-based systems such as Linux, file paths are represented as
text strings in this format, with forward-slashes delimiting the subdirectories and the actual
filename – in this case, file.txt:
my/path/to/file.txt
\my\path\to\file.txt
If you've been paying attention to what the backslash character means for Python strings,
you might remember that it acts as an escape sequence – i.e. the backslash modifies the
meaning of the token (i.e. character) that follows it. This means to print a literal backslash in
a Python string, you have to use double backslashes:
>>> print("\\my\\path\\to\\file.txt")
\my\path\to\file.txt
As you can imagine, that could complicate the ability to write code that works on Windows
and everywhere else.
The os.path.join() function takes as many arguments needed to generate a specified file path,
with each argument representing one component (i.e. subdirectory) of the path. So instead of
doing this:
mypath = "my/path/to/file.txt"
We do this:
And whether you're running code on Windows or Unix-based systems, the actual path to the
file will be consistent.
os.listdir() method in Python is used to get the list of all files and directories in the specified
directory. If we don’t specify any directory, then the list of files and directories in the current
working directory will be returned.
# importing os module
import os
dir_list = os.listdir(path)
print(dir_list)
Output:
['sys', 'run', 'tmp', 'boot', 'mnt', 'dev', 'proc', 'var', 'bin', 'lib64', 'usr',
os.path.getsize(): In this method, python will give us the size of the file in bytes. To use this
method we need to pass the name of the file as a parameter.
size = os.path.getsize("filename")
Output:
os.path.exists(): This method will check whether a file exists or not by passing the name of
the file as a parameter. OS module has a sub-module named PATH by using which we can
perform many more functions.
import os
#importing os module
Output
False
As in the above code, the file does not exist it will give output False. If the file exists it will
give us output True.
# basename function
import os
out = os.path.basename("/baz/foo")
print(out)
Output:
'foo'
os.path.dirname(path) : It is used to return the directory name from the path given. This
function returns the name from the path except the path name.
# dirname function
import os
out = os.path.dirname("/baz/foo")
print(out)
Output:
'/baz'
os.path.isabs(path) : It specifies whether the path is absolute or not. In Unix system absolute
path means path begins with the slash(‘/’) and in Windows that it begins with a (back)slash
after chopping off a potential drive letter.
#isabs function
import os
out = os.path.isabs("/baz/foo")
print(out)
Output:
True
os.path.isdir(path) : This function specifies whether the path is existing directory or not.
# isdir function
import os
out = os.path.isdir("C:\\Users")
print(out)
Output:
True
os.path.isfile(path) : This function specifies whether the path is existing file or not.
# isfile function
import os
out = os.path.isfile("C:\\Users\foo.csv")
print(out)
Output:
True
os.path.split() method in Python is used to Split the path name into a pair head and tail.
Here, tail is the last path name component and head is everything leading up to that.
# Python program to explain os.path.split() method
# importing os module
import os
# path
path = '/home/User/Desktop/file.txt'
head_tail = os.path.split(path)
# path
path = '/home/User/Desktop/'
head_tail = os.path.split(path)
# path
path = 'file.txt'
# Split the path in
head_tail = os.path.split(path)
Output:
Tail of '/home/User/Desktop/':
Head of 'file.txt':