You are on page 1of 194

Page 1

UNIT - I
S.NO TOPIC Pg No
1.1 Introduction to Data Science 1 - 3
1.2 Why Python? 3 - 5
1.3 Essential Python 5 - 6
1.4 libraries Python Introduction- 6 - 8
1.5 Features 8 - 9
1.6 Identifiers 9 - 9
1.7 Reserved words 10 - 11
1.8 Indentation 11 - 12
1.9 Comments 12 - 14
1.10 Built-in Data types and their Methods: Strings, List, 14 - 22
Tuples, Dictionary,Set.
1.11 Type Conversion 22 - 28
1.12 Operators. 29 - 44
1.13 Decision Making 45 - 46
1.14 Looping- Loop Control statement 46 - 47
1.15 Math and Random number functions. 48 - 53
1.16 User defined functions 54 - 57
1.17 function arguments & its types. 57 - 60
UNIT - II
S.NO TOPIC Pg No
2.1 User defined Modules and Packages in Python 61 - 74
2.2 Files: File manipulations 75 - 88
2.3 File and Directory related methods 89 - 92
2.4 Python Exception Handling. 93 - 96
2.5 OOPs Concepts 97 - 100
2.6 Class and Objects 101 - 104
2.7 Constructors 105 - 112
2.8 Data hiding 113 - 115
2.9 Data Abstraction 116 - 121
2.10 Inheritance. 122 - 128

Page 2
UNIT - III
S.NO TOPIC Pg No
3.1 NumPy Basics: Arrays The NumPy ndarray 129 - 130
3.2 Creating ndarrays 131 - 134
3.3 Data Types for ndarrays 135 - 135
3.4 Arithmetic with NumPy 136 - 139
3.5 Arrays- Basic Indexing and Slicing, Boolean Indexing, 140 - 144
Transposing Arrays and Swapping Axes. Universal
Functions:
3.6 Mathematical and Statistical Methods-Sorting 145 - 146
UNIT - IV
S.NO TOPIC Pg No
4.1 Introduction to pandas Data Structures: 147 - 149
4.2 Series 150 - 151
4.3 Data Frame 152 - 154
4.4 panels 155 - 156
4.5 Indexing Selection, 157 - 163
4.6 Filtering Function Application 164 - 164
4.7 Mapping 165 - 165
4.8 Sorting 166 - 168
4.9 Ranking. 169 - 169
4.10 Reading and Writing Data in Text Format 170 - 172
UNIT - V
S.NO TOPIC Pg No
5.1 Data Cleaning and Preparation: Handling Missing Data, 173 - 175
Data Transformation: Removing Duplicates
5.2 Transforming Data Using a Function or Mapping, 176 - 179
Replacing Values, Detecting and Filtering Outliers
5.3 String Manipulation: 180 - 183
5.4 Vectorized String Functions in pandas. 184 - 184
5.5 Plotting with pandas: Line Plots, Bar Plots, Histograms 184 - 191
and Density Plots, Scatter or Point Plots.

Page 3
Data Science using Python – Unit I

UNIT - I

1.1 Introduction of DATA SCIENCE

What is DATA SCIENCE?

Data Science is kind a blended with various tools, algorithms, and


machine learning principles. Most simply, it involves obtaining
meaningful information or insights from structured or unstructured data
through a process of analyzing, programming and business skills. It is a
field containing many elements like mathematics, statistics, computer
science, etc. Those who are good at these respective fields with enough
knowledge of the domain in which you are willing to work can call
themselves as Data Scientist. It’s not an easy thing to do but not
impossible too. You need to start from data, it’s visualization,
programming, formulation, development, and deployment of your model.
In the future, there will be great hype for data scientist jobs. Taking in
that mind, be ready to prepare yourself to fit in this world.

How Data Science Works?


Data science is not a one-step process such that you will get to
learn it in a short time and call ourselves a Data Scientist. It’s passes
from many stages and every element is important. One should always
follow the proper steps to reach the ladder. Every step has its value and
it counts in your model. Buckle up in your seats and get ready to learn
about those steps.

Problem Statement: No work start without motivation, Data science


is no exception though. It’s really important to declare or formulate your

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 1

Page 4
Data Science using Python – Unit I

problem statement very clearly and precisely. Your whole model and it’s
working depend on your statement. Many scientists considers this as the
main and much important step of Date Science. So make sure what’s
your problem statement and how well can it add value to business or
any other organization.

Data Collection: After defining the problem statement, the next


obvious step is to go in search of data that you might require for your
model. You must do good research, find all that you need. Data can be
in any form i.e unstructured or structured. It might be in various forms
like videos, spreadsheets, coded forms, etc. You must collect all these
kinds of sources.

Data Cleaning: As you have formulated your motive and also you
did collect your data, the next step to do is cleaning. Yes, it is! Data
cleaning is the most favorite thing for data scientists to do. Data cleaning
is all about the removal of missing, redundant, unnecessary and
duplicate data from your collection. There are various tools to do so with
the help of programming in either R or Python. It’s totally on you to
choose one of them. Various scientists have their opinion on which to
choose. When it comes to the statistical part, R is preferred over Python,
as it has the privilege of more than 12,000 packages. While python is
used as it is fast, easily accessible and we can perform the same things
as we can in R with the help of various packages.

Data Analysis and Exploration: It’s one of the prime things in data
science to do and time to get inner Holmes out. It’s about analyzing the
structure of data, finding hidden patterns in them, studying behaviors,
visualizing the effects of one variable over others and then concluding.
We can explore the data with the help of various graphs formed with the

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 2

Page 5
Data Science using Python – Unit I

help of libraries using any programming language. In R, GGplot is one of


the most famous models while Matplotlib in Python.

Data Modeling: Once you are done with your study that you have
formed from data visualization, you must start building a hypothesis
model such that it may yield you a good prediction in future. Here, you
must choose a good algorithm that best fit to your model. There different
kinds of algorithms from regression to classification, SVM( Support
Vector Machines), Clustering, etc. Your model can be of a Machine
Learning algorithm. You train your model with the train data and then
test it with test data. There are various methods to do so. One of them is
the K-fold method where you split your whole data into two parts, One is
Train and the other is test data. On these bases, you train your model.

Optimization and Deployment: You followed each and every step


and hence build a model that you feel is the best fit. But how can you
decide how well your model is performing? This where optimization
comes. You test your data and find how well it is performing by checking
its accuracy. In short, you check the efficiency of the data model and
thus try to optimize it for better accurate prediction. Deployment deals
with the launch of your model and let the people outside there to benefit
from that. You can also obtain feedback from organizations and people
to know their need and then to work more on your model.

1.2 What Is Python


Python is a general purpose, dynamic, high-level, and interpreted
programming language. It supports Object Oriented programming
approach to develop applications. It is simple and easy to learn and
provides lots of high-level data structures.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 3

Page 6
Data Science using Python – Unit I

Python is easy to learn yet powerful and versatile scripting


language, which makes it attractive for Application Development.

Python's syntax and dynamic typing with its interpreted nature


make it an ideal language for scripting and rapid application
development.

Python supports multiple programming pattern, including object-


oriented, imperative, and functional or procedural programming styles.

Python is not intended to work in a particular area, such as web


programming. That is why it is known as multipurpose programming
language because it can be used with web, enterprise, 3D CAD, etc.

We don't need to use data types to declare variable because it


is dynamically typed so we can write a=10 to assign an integer value in
an integer variable.

Python makes the development and debugging fast because there


is no compilation step included in Python development, and edit-test-
debug cycle is very fast.

Why Python
Python is open source, interpreted, high level language and
provides great approach for object-oriented programming. It is one of the
best language used by data scientist for various data science
projects/application. Python provide great functionality to deal with
mathematics, statistics and scientific function. It provides great libraries
to deals with data science application.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 4

Page 7
Data Science using Python – Unit I

One of the main reasons why Python is widely used in the


scientific and research communities is because of its ease of use and
simple syntax which makes it easy to adapt for people who do not have
an engineering background. It is also more suited for quick prototyping.

According to engineers coming from academia and industry, deep


learning frameworks available with Python APIs, in addition to the
scientific packages have made Python incredibly productive and
versatile. There has been a lot of evolution in deep learning Python
frameworks and it’s rapidly upgrading.

In terms of application areas, ML scientists prefer Python as well.


When it comes to areas like building fraud detection algorithms and
network security, developers leaned towards Java, while for applications
like natural language processing (NLP) and sentiment analysis,
developers opted for Python, because it provides large collection of
libraries that help to solve complex business problem easily, build strong
system and data application.

1.3 Essential Python


Python is a general purpose programming language that was
designed to be compact, easy to use, easy to extend, and which has a
large standard library and a very active development community. As well
as being a general purpose programming language, Python is widely
used as a scripting language, a glue language, for data science and
machine learning, and for software test.

Whether you work in artificial intelligence or finance or are


pursuing a career in web development or data science, Python is one of
the most important skills you can learn. Python's simple syntax is

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 5

Page 8
Data Science using Python – Unit I

especially suited for desktop, web, and business applications. Python's


design philosophy emphasizes readability and usability. Python was
developed on the premise that there should be only one way (and
preferably, one obvious way) to do things, a philosophy that resulted in a
strict level of code standardization. The core programming language is
quite small and the standard library is also large. In fact, Python's large
library is one of its greatest benefits, providing different tools for
programmers suited for a variety of tasks.

Essential Python is intended for professionals working in


the electronic systems hardware and embedded software development
flows.

1.4 Libraries Python

Most Commonly used libraries for data science :

Numpy: Numpy is Python library that provides mathematical


function to handle large dimension array. It provides various
method/function for Array, Metrics, and linear algebra.

NumPy stands for Numerical Python. It provides lots of useful


features for operations on n-arrays and matrices in Python. The library
provides vectorization of mathematical operations on the NumPy array
type, which enhance performance and speeds up the execution. It’s very
easy to work with large multidimensional arrays and matrices using
NumPy.

Pandas: Pandas is one of the most popular Python library for data
manipulation and analysis. Pandas provide useful functions to
manipulate large amount of structured data. Pandas provide easiest
method to perform analysis. It provide large data structures and

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 6

Page 9
Data Science using Python – Unit I

manipulating numerical tables and time series data. Pandas is a perfect


tool for data wrangling. Pandas is designed for quick and easy data
manipulation, aggregation, and visualization. There two data structures
in Pandas –

Series – It Handle and store data in one-dimensional data.

DataFrame – It Handle and store Two dimensional data.

Matplotlib: Matplotlib is another useful Python library for Data


Visualization. Descriptive analysis and visualizing data is very important
for any organization. Matplotlib provides various method to Visualize
data in more effective way. Matplotlib allows to quickly make line graphs,
pie charts, histograms, and other professional grade figures. Using
Matplotlib, one can customize every aspect of a figure. Matplotlib has
interactive features like zooming and planning and saving the Graph in
graphics format.

Scipy: Scipy is another popular Python library for data science


and scientific computing. Scipy provides great functionality to scientific
mathematics and computing programming. SciPy contains sub-modules
for optimization, linear algebra, integration, interpolation, special
functions, FFT, signal and image processing, ODE solvers, Statmodel
and other tasks common in science and engineering.

Scikit – learn: Sklearn is Python library for machine learning.


Sklearn provides various algorithms and functions that are used in
machine learning. Sklearn is built on NumPy, SciPy, and matplotlib.
Sklearn provides easy and simple tools for data mining and data
analysis. It provides a set of common machine learning algorithms to
users through a consistent interface. Scikit-Learn helps to quickly

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 7

Page 10
Data Science using Python – Unit I

implement popular algorithms on datasets and solve real-world


problems.

Introduction
Python is a widely used general-purpose, high level programming
language. It was created by Guido van Rossum in 1991 and further
developed by the Python Software Foundation. It was designed with an
emphasis on code readability, and its syntax allows programmers to
express their concepts in fewer lines of code. Python is a programming
language that lets you work quickly and integrate systems more
efficiently.

There are two major Python versions: Python 2 and Python 3. Both
are quite different.

1.5 Features
Following are some useful features of Python language:

 It uses the elegant syntax, hence the programs are easier to read.
 It is a simple to access language, which makes it easy to achieve
the program working.
 The large standard library and community support.
 The interactive mode of Python makes its simple to test codes.
 In Python, it is also simple to extend the code by appending new
modules that are implemented in other compiled language like
C++ or C.
 Python is an expressive language which is possible to embed into
applications to offer a programmable interface.
 Allows developer to run the code anywhere, including Windows,
Mac OS X, UNIX, and Linux.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 8

Page 11
Data Science using Python – Unit I

 It is free software in a couple of categories. It does not cost


anything to use or download Pythons or to add it to the
application.

1.6 Identifiers
Identifier is a name used to identify a variable, function, class,
module, etc. The identifier is a combination of character digits and
underscore. The identifier should start with a character or Underscore
then use a digit. The characters are A-Z or a-z, an Underscore ( _ ) , and
digit (0-9). we should not use special characters ( #, @, $, %, ! ) in
identifiers.

Examples of valid identifiers:

var1

_var1

_1_var

var_1

Examples of invalid identifiers:

!var1

1var

1_var

var#1

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 9

Page 12
Data Science using Python – Unit I

1.7 Reserved words

Total Python Keywords


Keywords Description
This is a logical operator it returns true if both the operands are
and
true else return false.
This is also a logical operator it returns true if anyone operand is
or
true else return false.
This is again a logical operator it returns True if the operand is
not
false else return false.
if This is used to make a conditional statement.
Elif is a condition statement used with an if statement the elif
elif
statement is executed if the previous conditions were not true

Else is used with if and elif conditional statement the else block is
else
executed if the given condition is not true.

for This is created for a loop.

while This keyword is used to create a while loop.

break This is used to terminate the loop.

as This is used to create an alternative.

def It helps us to define functions.

lambda It is used to define the anonymous function.

pass This is a null statement which means it will do nothing.

return It will return a value and exit the function.

true This is a boolean value.

false This is also a boolean value.

try It makes a try-except statement.

with The with keyword is used to simplify exception handling.


This function is used for debugging purposes. Usually used to
assert
check the correctness of code

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 10

Page 13
Data Science using Python – Unit I

class It helps us to define a class.

continue It continues to the next iteration of a loop

del It deletes a reference to an object.

except Used with exceptions, what to do when an exception occurs

Finally is use with exceptions, a block of code that will be


finally
executed no matter if there is an exception or not.

from The form is used to import specific parts of any module.

global This declares a global variable.

import This is used to import a module.

in It’s used to check if a value is present in a list, tuple, etc, or not.

is This is used to check if the two variables are equal or not.

This is a special constant used to denote a null value or avoid. It’s


none important to remember, 0, any empty container(e.g empty list) do
not compute to None

nonlocal It’s declared a non-local variable.

raise This raises an exception

yield It’s ends a function and returns a generator.

1.8 Indentation
Indentation is a very important concept of Python because without
properly indenting the Python code, you will end up seeing
IndentationError and the code will not get compiled.

Python Indentation
Python indentation refers to adding white space before a
statement to a particular block of code. In another word, all the

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 11

Page 14
Data Science using Python – Unit I

statements with the same space to the right, belong to the same code
block.

Example 1

The lines print(‘Welcome…’) and print(‘retype the Good Bye.’) are


two separate code blocks. The two blocks of code in our example if-
statement are both indented four spaces. The final print(‘All set!’) is not
indented, so it does not belong to the else block.
site = 'Hi'

if site == 'Hi':

print(Welcome...')

else:

print('retype the Good Bye.')

print('All set !')

1.9 Comments :
Comments in Python are the lines in the code that are ignored by
the interpreter during the execution of the program. Comments enhance
the readability of the code and help the programmers to understand the
code very carefully.

Types of Comments in Python :

There are three main kinds of comments in Python. They are:

Single-Line Comments: Python single-line comment starts with the


hashtag symbol (#) with no white spaces and lasts till the end of the line.
If the comment exceeds one line then put a hashtag on the next line and
continue the comment. Python’s single-line comments are proved useful

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 12

Page 15
Data Science using Python – Unit I

for supplying short explanations for variables, function declarations, and


expressions.

# Python program to demonstrate comments

Multi-Line Comments: Python does not provide the option for multiline
comments. However, there are different ways through which we can
write multiline comments.

Using Multiple Hashtags (#)

We can multiple hashtags (#) to write multiline comments in


Python. Each and every line will be considered as a single-line
comment.

Example: Multiline comments using multiple hashtags (#)

Using String Literals: Python ignores the string literals that are not
assigned to a variable so we can use these string literals as a comment.

Example: """ Python program to demonstrate

multiline comments"""

Python Docstring :

Python docstring is the string literals with triple quotes that are
appeared right after the function. It is used to associate documentation
that has been written with Python modules, functions, classes, and
methods. It is added right below the functions, modules, or classes to
describe what they do. In Python, the docstring is then made available
via the __doc__ attribute.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 13

Page 16
Data Science using Python – Unit I

Example:

def multiply(a, b):

"""Multiplies the value of a and b"""

return a*b

# Print the docstring of multiply function

print(multiply.__doc__)

Output:

Multiplies the value of a and b

1.10 Python Data Types


Variables can hold values, and every value has a data-type.
Python is a dynamically typed language; hence we do not need to define
the type of the variable while declaring it. The interpreter implicitly binds
the value with its type.

1. a = 5

The variable a holds integer value five and we did not define its
type. Python interpreter will automatically interpret variables a as an
integer type.

Python enables us to check the type of the variable used in the


program. Python provides us the type() function, which returns the type
of the variable passed.

Consider the following example to define the values of different


data types and checking its type.

1. a=10

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 14

Page 17
Data Science using Python – Unit I

2. b="Hi Python"
3. c = 10.5
4. print(type(a))
5. print(type(b))
6. print(type(c))

Output:

<type 'int'>
<type 'str'>
<type 'float'>

Standard data types


A variable can hold different types of values. For example, a
person's name must be stored as a string whereas its id must be stored
as an integer.

Python provides various standard data types that define the


storage method on each of them. The data types defined in Python are
given below.

1. Numbers
2. Sequence Type
3. Boolean
4. Set
5. Dictionary

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 15

Page 18
Data Science using Python – Unit I

In this section of the tutorial, we will give a brief introduction of the


above data-types. We will discuss each one of them in detail later in this
tutorial.

Numbers
Number stores numeric values. The integer, float, and complex
values belong to a Python Numbers data-type. Python provides
the type() function to know the data-type of the variable. Similarly,
the isinstance() function is used to check an object belongs to a
particular class.

Python creates Number objects when a number is assigned to a


variable. For example;

a=5
print("The type of a", type(a))
b = 40.5
print("The type of b", type(b))

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 16

Page 19
Data Science using Python – Unit I

c = 1+3j
print("The type of c", type(c))
print(" c is a complex number", isinstance(1+3j,complex))

Output:

The type of a <class 'int'>


The type of b <class 'float'>
The type of c <class 'complex'>
c is complex number: True

Python supports three types of numeric data.

1. Int - Integer value can be any length such as integers 10, 2, 29, -
20, -150 etc. Python has no restriction on the length of an integer.
Its value belongs to int

2. Float - Float is used to store floating-point numbers like 1.9, 9.902,


15.2, etc. It is accurate upto 15 decimal points.

3. complex - A complex number contains an ordered pair, i.e., x + iy


where x and y denote the real and imaginary parts, respectively.
The complex numbers like 2.14j, 2.0 + 2.3j, etc.

Sequence Type
String
The string can be defined as the sequence of characters
represented in the quotation marks. In Python, we can use single,
double, or triple quotes to define a string.

String handling in Python is a straightforward task since Python


provides built-in functions and operators to perform operations in the
string.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 17

Page 20
Data Science using Python – Unit I

In the case of string handling, the operator + is used to


concatenate two strings as the operation "hello"+" python" returns "hello
python".

The operator * is known as a repetition operator as the operation


"Python" *2 returns 'Python Python'.

The following example illustrates the string in Python.

Example - 1

1. str = "string using double quotes"


2. print(str)
3. s = '''''A multiline
4. string'''
5. print(s)

Output:

string using double quotes


A multiline
string

Consider the following example of string handling.

List
Python Lists are similar to arrays in C. However, the list can
contain data of different types. The items stored in the list are separated
with a comma (,) and enclosed within square brackets [].

We can use slice [:] operators to access the data of the list. The
concatenation operator (+) and repetition operator (*) works with the list
in the same way as they were working with the strings.

Consider the following example.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 18

Page 21
Data Science using Python – Unit I

1. list1 = [1, "hi", "Python", 2]


2. #Checking type of given list
3. print(type(list1))
4.
5. #Printing the list1
6. print (list1)
7.
8. # List slicing
9. print (list1[3:])
10.
11. # List slicing
12. print (list1[0:2])
13.
14. # List Concatenation using + operator
15. print (list1 + list1)
16.
17. # List repetation using * operator
18. print (list1 * 3)

Output:

[1, 'hi', 'Python', 2]


[2]
[1, 'hi']
[1, 'hi', 'Python', 2, 1, 'hi', 'Python', 2]
[1, 'hi', 'Python', 2, 1, 'hi', 'Python', 2, 1, 'hi', 'Python', 2]

Tuple
A tuple is similar to the list in many ways. Like lists, tuples also
contain the collection of the items of different data types. The items of
the tuple are separated with a comma (,) and enclosed in parentheses ().

A tuple is a read-only data structure as we can't modify the size


and value of the items of a tuple.

-Let's see a simple example of the tuple.


1. tup = ("hi", "Python", 2)
2. # Checking type of tup
3. print (type(tup))

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 19

Page 22
Data Science using Python – Unit I

4.
5. #Printing the tuple
6. print (tup)
7.
8. # Tuple slicing
9. print (tup[1:])
10. print (tup[0:1])
11.
12. # Tuple concatenation using + operator
13. print (tup + tup)
14.
15. # Tuple repatation using * operator
16. print (tup * 3)
17.
18. # Adding value to tup. It will throw an error.
19. t[2] = "hi"

Output:

<class 'tuple'>
('hi', 'Python', 2)
('Python', 2)
('hi',)
('hi', 'Python', 2, 'hi', 'Python', 2)
('hi', 'Python', 2, 'hi', 'Python', 2, 'hi', 'Python', 2)
Traceback (most recent call last):
File "main.py", line 14, in <module>
t[2] = "hi";
TypeError: 'tuple' object does not support item assignment

Dictionary
Dictionary is an unordered set of a key-value pair of items. It is like
an associative array or a hash table where each key stores a specific
value. Key can hold any primitive data type, whereas value is an
arbitrary Python object.

The items in the dictionary are separated with the comma (,) and
enclosed in the curly braces {}.

Consider the following example.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 20

Page 23
Data Science using Python – Unit I

1. d = {1:'Jimmy', 2:'Alex', 3:'john', 4:'mike'}


2. # Printing dictionary
3. print (d)
4.
5. # Accesing value using keys
6. print("1st name is "+d[1])
7. print("2nd name is "+ d[4])
8.
9. print (d.keys())
10. print (d.values())

Output:

1st name is Jimmy


2nd name is mike
{1: 'Jimmy', 2: 'Alex', 3: 'john', 4: 'mike'}
dict_keys([1, 2, 3, 4])
dict_values(['Jimmy', 'Alex', 'john', 'mike'])

Boolean
Boolean type provides two built-in values, True and False. These
values are used to determine the given statement true or false. It
denotes by the class bool. True can be represented by any non-zero
value or 'T' whereas false can be represented by the 0 or 'F'. Consider
the following example.

1. # Python program to check the boolean type


2. print(type(True))
3. print(type(False))
4. print(false)

Output:

<class 'bool'>
<class 'bool'>
NameError: name 'false' is not defined

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 21

Page 24
Data Science using Python – Unit I

Set
Python Set is the unordered collection of the data type. It is
iterable, mutable(can modify after creation), and has unique elements. In
set, the order of the elements is undefined; it may return the changed
sequence of the element. The set is created by using a built-in
function set(), or a sequence of elements is passed in the curly braces
and separated by the comma. It can contain various types of values.
Consider the following example.

# Creating Empty set


set1 = set()
set2 = {'James', 2, 3,'Python'}
#Printing Set value
print(set2)
# Adding element to the set
set2.add(10)
print(set2)
#Removing element from the set
set2.remove(2)
print(set2)

Output:

{3, 'Python', 'James', 2}


{'Python', 'James', 3, 2, 10}
{'Python', 'James', 3, 10}

1..11 Type Conversion


Python defines type conversion functions to directly convert one
data type to another which is useful in day-to-day and competitive
programming. This article is aimed at providing information about certain
conversion functions.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 22

Page 25
Data Science using Python – Unit I

There are two types of Type Conversion in Python:

1. Implicit Type Conversion


2. Explicit Type Conversion

Let’s discuss them in detail.

Implicit Type Conversion


In Implicit type conversion of data types in Python, the Python
interpreter automatically converts one data type to another without any
user involvement.

Example:
x = 10

print("x is of type:",type(x))

y = 10.6

print("y is of type:",type(y))

z=x+y

print(z)

print("z is of type:",type(z))

Output:

x is of type: <class 'int'>

y is of type: <class 'float'>

20.6

z is of type: <class 'float'>

As we can see the data type of ‘z’ got automatically changed to the
“float” type while one variable x is of integer type while the other variable
y is of float type. The reason for the float value not being converted into
an integer instead is due to type promotion that allows performing

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 23

Page 26
Data Science using Python – Unit I

operations by converting data into a wider-sized data type without any


loss of information. This is a simple case of Implicit type conversion in
python.

Explicit Type Conversion


In Explicit Type Conversion in Python, the data type is manually
changed by the user as per their requirement. With explicit type
conversion, there is a risk of data loss since we are forcing an
expression to be changed in some specific data type. Various forms of
explicit type conversion are explained below:

1. int(a, base): This function converts any data type to integer. ‘Base’
specifies the base in which string is if the data type is a string.
2. float(): This function is used to convert any data type to a floating-
point number.

Python3

# Python code to demonstrate Type conversion

# using int(), float()

# initializing string

s = "10010"

# printing string converting to int base 2

c = int(s,2)

print ("After converting to integer base 2 : ", end="")

print (c)

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 24

Page 27
Data Science using Python – Unit I

# printing string converting to float

e = float(s)

print ("After converting to float : ", end="")

print (e)

Output:

After converting to integer base 2 : 18

After converting to float : 10010.0

3. ord() : This function is used to convert a character to integer.

4. hex() : This function is to convert integer to hexadecimal string.

5. oct() : This function is to convert integer to octal string.

Python3

# Python code to demonstrate Type conversion

# using ord(), hex(), oct()

# initializing integer

s = '4'

# printing character converting to integer

c = ord(s)

print ("After converting character to integer : ",end="")

print (c)

# printing integer converting to hexadecimal string

c = hex(56)

print ("After converting 56 to hexadecimal string : ",end="")

print (c)

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 25

Page 28
Data Science using Python – Unit I

# printing integer converting to octal string

c = oct(56)

print ("After converting 56 to octal string : ",end="")

print (c)

Output:

After converting character to integer : 52

After converting 56 to hexadecimal string : 0x38

After converting 56 to octal string : 0o70

6. tuple() : This function is used to convert to a tuple.

7. set() : This function returns the type after converting to set.

8. list() : This function is used to convert any data type to a list type.

Python3

# Python code to demonstrate Type conversion

# using tuple(), set(), list()

# initializing string

s = 'geeks'

# printing string converting to tuple

c = tuple(s)

print ("After converting string to tuple : ",end="")

print (c)

# printing string converting to set

c = set(s)

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 26

Page 29
Data Science using Python – Unit I

print ("After converting string to set : ",end="")

print (c)

# printing string converting to list

c = list(s)

print ("After converting string to list : ",end="")

print (c)

Output:

After converting string to tuple : ('g', 'e', 'e', 'k', 's')

After converting string to set : {'k', 'e', 's', 'g'}

After converting string to list : ['g', 'e', 'e', 'k', 's']

9. dict() : This function is used to convert a tuple of order (key,value) into a dictionary.

10. str() : Used to convert integer into a string.

11. complex(real,imag) : This function converts real numbers to complex(real,imag) number.

Python3

# Python code to demonstrate Type conversion

# using dict(), complex(), str()

# initializing integers

a=1

b=2

# initializing tuple

tup = (('a', 1) ,('f', 2), ('g', 3))

# printing integer converting to complex number

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 27

Page 30
Data Science using Python – Unit I

c = complex(1,2)

print ("After converting integer to complex number : ",end="")

print (c)

# printing integer converting to string

c = str(a)

print ("After converting integer to string : ",end="")

print (c)

# printing tuple converting to expression dictionary

c = dict(tup)

print ("After converting tuple to dictionary : ",end="")

print (c)

Output:

After converting integer to complex number : (1+2j)

After converting integer to string : 1

After converting tuple to dictionary : {'a': 1, 'f': 2, 'g': 3}

12. chr(number): This function converts number to its corresponding ASCII character.

Python3

# Convert ASCII value to characters

a = chr(76)

b = chr(77)

print(a)

print(b)

Output: L M

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 28

Page 31
Data Science using Python – Unit I

1..12 Operators:
Python Operators in general are used to perform operations on
values and variables. These are standard symbols used for the purpose
of logical and arithmetic operations. In this article, we will look into
different types of Python operators.

OPERATORS: Are the special symbols. Eg- + , * , /, etc.

OPERAND: It is the value on which the operator is applied.

Arithmetic Operators:
Arithmetic operators are used to performing mathematical
operations like addition, subtraction, multiplication, and division.

In Python 3.x the result of division is a floating-point while in


Python 2.x division of 2 integer was an integer and to obtain an integer
result in Python 3.x floored (// integer) is used.

Operator Description Syntax


+ Addition: adds two operands x+y

– Subtraction: subtracts two operands x–y

* Multiplication: multiplies two operands x*y

/ Division (float): divides the first operand by the second x/y

// Division (floor): divides the first operand by the second x // y


Modulus: returns the remainder when the first operand is
% x%y
divided by the second
** Power: Returns first raised to power second x ** y

PRECEDENCE:

P – Parentheses

E – Exponentiation

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 29

Page 32
Data Science using Python – Unit I

M – Multiplication (Multiplication and division have the same


precedence)

D – Division

A – Addition (Addition and subtraction have the same precedence)

S – Subtraction

The modulus operator helps us extract the last digit/s of a number.


For example:

x % 10 -> yields the last digit

x % 100 -> yield last two digits

Example: Arithmetic operators in Python


Python3

# Examples of Arithmetic Operator


a=9
b=4

# Addition of numbers
add = a + b

# Subtraction of numbers
sub = a - b

# Multiplication of number
mul = a * b

# Division(float) of number
div1 = a / b

# Division(floor) of number
div2 = a // b

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 30

Page 33
Data Science using Python – Unit I

# Modulo of both number


mod = a % b

# Power
p = a ** b

# print results
print(add)
print(sub)
print(mul)
print(div1)
print(div2)
print(mod)
print(p)

Output:

13

36

2.25

6561

Comparison of Relational operators compares the values. It either


returns True or False according to the condition.

Operator Description Syntax


> Greater than: True if the left operand is greater than the right x>y

< Less than: True if the left operand is less than the right x<y

== Equal to: True if both operands are equal x == y

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 31

Page 34
Data Science using Python – Unit I

!= Not equal to – True if operands are not equal x != y

Greater than or equal to True if the left operand is greater


>= x >= y
than or equal to the right
Less than or equal to True if the left operand is less than or
<= x <= y
equal to the right
is x is the same as y x is y

is not x is not the same as y x is not y

= is an assignment operator and == comparison operator.

Example: Comparison Operators in Python


Python3

# Examples of Relational Operators

a = 13

b = 33

# a > b is False

print(a > b)

# a < b is True

print(a < b)

# a == b is False

print(a == b)

# a != b is True

print(a != b)

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 32

Page 35
Data Science using Python – Unit I

# a >= b is False

print(a >= b)

# a <= b is True

print(a <= b)

Output:

False

True

False

True

False

True

Logical Operators:
Logical Operators perform Logical AND, Logical OR, and Logical
NOT operations. It is used to combine conditional statements.

Operator Description Syntax


and Logical AND: True if both the operands are true x and y

or Logical OR: True if either of the operands is true x or y

not Logical NOT: True if the operand is false not x

Example: Logical Operators in Python

Python3

# Examples of Logical Operator

a = True

b = False

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 33

Page 36
Data Science using Python – Unit I

# Print a and b is False

print(a and b)

# Print a or b is True

print(a or b)

# Print not a is False

print(not a)

Output:

False

True

False

Bitwise Operators:
Bitwise operators act on bits and perform the bit-by-bit operations.
These are used to operate on binary numbers.

Operator Description Syntax


& Bitwise AND x&y

| Bitwise OR x|y

~ Bitwise NOT ~x

^ Bitwise XOR x^y

>> Bitwise right shift x>>

<< Bitwise left shift x<<

Example: Bitwise Operators in Python

Python3
# Examples of Bitwise operators

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 34

Page 37
Data Science using Python – Unit I

a = 10
b=4
# Print bitwise AND operation
print(a & b)
# Print bitwise OR operation
print(a | b)
# Print bitwise NOT operation
print(~a)
# print bitwise XOR operation
print(a ^ b)
# print bitwise right shift operation
print(a >> 2)
# print bitwise left shift operation
print(a << 2)
Output:
0
14
-11
14
2
40

Assignment Operators:
Assignment operators are used to assign values to the variables.

Operator Description Syntax

Assign value of right side of expression to left side


= x=y+z
operand

Add AND: Add right-side operand with left side operand a+=b
+=
and then assign to left operand a=a+b

Subtract AND: Subtract right operand from left operand a-=b


-=
and then assign to left operand a=a-b

Multiply AND: Multiply right operand with left operand a*=b


*=
and then assign to left operand a=a*b

Divide AND: Divide left operand with right operand and a/=b
/=
then assign to left operand a=a/b

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 35

Page 38
Data Science using Python – Unit I

Modulus AND: Takes modulus using left and right a%=b


%=
operands and assign the result to left operand a=a%b

Divide(floor) AND: Divide left operand with right operand a//=b


//=
and then assign the value(floor) to left operand a=a//b

Exponent AND: Calculate exponent(raise power) value a**=b


**=
using operands and assign value to left operand a=a**b

Performs Bitwise AND on operands and assign value to a&=b


&=
left operand a=a&b

Performs Bitwise OR on operands and assign value to a|=b


|=
left operand a=a|b

Performs Bitwise xOR on operands and assign value to a^=b


^=
left operand a=a^b

Performs Bitwise right shift on operands and assign a>>=b


>>=
value to left operand a=a>>b

Performs Bitwise left shift on operands and assign value a <<= b


<<=
to left operand a= a << b

Example: Assignment Operators in Python

Python3

# Examples of Assignment Operators

a = 10

# Assign value

b=a

print(b)

# Add and assign value

b += a

print(b)

# Subtract and assign value

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 36

Page 39
Data Science using Python – Unit I

b -= a

print(b)

# multiply and assign

b *= a

print(b)

# bitwise lishift operator

b <<= a

print(b)

Output:
10
20
10
100
102400

Identity Operators:

is and is not are the identity operators both are used to check if two
values are located on the same part of the memory. Two variables that
are equal do not imply that they are identical.

is True if the operands are identical

is not True if the operands are not identical

Example: Identity Operator


Python3

a = 10

b = 20

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 37

Page 40
Data Science using Python – Unit I

c=a

print(a is not b)

print(a is c)

Output:
True

True

Membership Operators:
in and not in are the membership operators; used to test whether a
value or variable is in a sequence.

In True if value is found in the sequence

not in True if value is not found in the sequence

Example: Membership Operator

Python3

# Python program to illustrate

# not 'in' operator

x = 24

y = 20

list = [10, 20, 30, 40, 50]

if (x not in list):

print("x is NOT present in given list")

else:

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 38

Page 41
Data Science using Python – Unit I

print("x is present in given list")

if (y in list):

print("y is present in given list")

else:

print("y is NOT present in given list")

Output:
x is NOT present in given list

y is present in given list

Precedence and Associativity of Operators:


Precedence and Associativity of Operators: Operator precedence
and associativity determine the priorities of the operator.

Operator Precedence

This is used in an expression with more than one operator with


different precedence to determine which operation to perform first.

Example: Operator Precedence

Python3

# Examples of Operator Precedence

# Precedence of '+' & '*'

expr = 10 + 20 * 30

print(expr)

# Precedence of 'or' & 'and'

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 39

Page 42
Data Science using Python – Unit I

name = "Alex"

age = 0

if name == "Alex" or name == "John" and age >= 2:

print("Hello! Welcome.")

else:

print("Good Bye!!")

Output:
610

Hello! Welcome.

Operator Associativity
If an expression contains two or more operators with the same
precedence then Operator Associativity is used to determine. It can
either be Left to Right or from Right to Left.

Example: Operator Associativity

Python3

# Examples of Operator Associativity

# Left-right associativity

# 100 / 10 * 10 is calculated as

# (100 / 10) * 10 and not

# as 100 / (10 * 10)

print(100 / 10 * 10)

# Left-right associativity

# 5 - 2 + 3 is calculated as

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 40

Page 43
Data Science using Python – Unit I

# (5 - 2) + 3 and not

# as 5 - (2 + 3)

print(5 - 2 + 3)

# left-right associativity

print(5 - (2 + 3))

# right-left associativity

# 2 ** 3 ** 2 is calculated as

# 2 ** (3 ** 2) and not

# as (2 ** 3) ** 2

print(2 ** 3 ** 2)

Output:

100.0

512

Ternary operators:
Ternary operators are also known as conditional expressions are
operators that evaluate something based on a condition being true or
false. It was added to Python in version 2.5. It simply allows testing a
condition in a single line replacing the multiline if-else making the code
compact.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 41

Page 44
Data Science using Python – Unit I

Syntax :

[on_true] if [expression] else [on_false]

Simple Method to use ternary operator:

Python

# Program to demonstrate conditional operator

a, b = 10, 20

# Copy value of a in min if a < b else copy b

min = a if a < b else b

print(min)

Output:
10

1.0

>>>10/2

5.0

>>>-10/2

-5.0

>>>20.0/2

10.0

(ii) Integer division( Floor division):


The quotient returned by this operator is dependent on the
argument being passed. If any of the numbers is float, it returns output in
float. It is also known as Floor division because, if any number is
negative, then the output will be floored. For example:

>>>5//5

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 42

Page 45
Data Science using Python – Unit I

>>>3//2

>>>10//3

Consider the below statements in Python.

Python3

# A Python program to demonstrate the use of

# "//" for integers

print (5//2)

print (-5//2)

Output:
2
-3
The first output is fine, but the second one may be surprised if we
are coming Java/C++ world. In Python, the “//” operator works as a floor
division for integer and float arguments. However, the division operator
‘/’ returns always a float value.

Note: The “//” operator is used to return the closest integer value
which is less than or equal to a specified expression or value. So from
the above code, 5//2 returns 2. You know that 5/2 is 2.5, and the closest
integer which is less than or equal is 2[5//2].( it is inverse to the normal
maths, in normal maths the value is 3).

Example:

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 43

Page 46
Data Science using Python – Unit I

Python3

# A Python program to demonstrate use of

# "/" for floating point numbers

print (5.0/2)

print (-5.0/2)

Output:
2.5
-2.5
The real floor division operator is “//”. It returns the floor value for
both integer and floating-point arguments.

Python3

# A Python program to demonstrate use of

# "//" for both integers and floating points

print (5//2)

print (-5//2)

print (5.0//2)

print (-5.0//2)

Output:
2
-3
2.0
-3.0

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 44

Page 47
Data Science using Python – Unit I

1..13 Decision making


Decision making is anticipation of conditions occurring while
execution of the program and specifying actions taken according to the
conditions.
Decision structures evaluate multiple expressions which produce TRUE
or FALSE as outcome. You need to determine which action to take and
which statements to execute if outcome is TRUE or FALSE otherwise.
Following is the general form of a typical decision making structure found
in most of the programming languages −
Python programming language assumes
any non-zero and non-null values as TRUE,
and if it is either zero or null, then it is
assumed as FALSE value.
Python programming language provides
following types of decision making statements.
Click the following links to check their detail.

Sr.No. Statement & Description

1 if statements

An if statement consists of a boolean expression followed by


one or more statements.

2 if...else statements

An if statement can be followed by an optional else


statement, which executes when the boolean expression is
FALSE.

3 nested if statements

You can use one if or else if statement inside


another if or else if statement(s).

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 45

Page 48
Data Science using Python – Unit I

Let us go through each decision making briefly −

Single Statement Suites


If the suite of an if clause consists only of a single line, it may go on
the same line as the header statement.
Here is an example of a one-line if clause −
#!/usr/bin/python
var = 100
if ( var == 100 ) : print "Value of expression is 100"
print "Good bye!"
When the above code is executed, it produces the following result −
Value of expression is 100
Good bye!

1..14 loop statement


In general, statements are executed
sequentially: The first statement in a
function is executed first, followed by the
second, and so on. There may be a
situation when you need to execute a
block of code several number of times.
Programming languages provide various
control structures that allow for more
complicated execution paths.
A loop statement allows us to execute a
statement or group of statements multiple
times. The following diagram illustrates a
loop statement −

Python programming language provides following types of loops to


handle looping requirements.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 46

Page 49
Data Science using Python – Unit I

Sr.No. Loop Type & Description

1 while loop

Repeats a statement or group of statements while a given condition is


TRUE. It tests the condition before executing the loop body.

2 for loop

Executes a sequence of statements multiple times and abbreviates the


code that manages the loop variable.

3 nested loops

You can use one or more loop inside any another while, for or do..while
loop.

Loop Control Statements


Loop control statements change execution from its normal
sequence. When execution leaves a scope, all automatic objects that
were created in that scope are destroyed.
Python supports the following control statements. Click the
following links to check their detail.
Let us go through the loop control statements briefly

Sr.No. Control Statement & Description

1 break statement
Terminates the loop statement and transfers execution to the statement
immediately following the loop.

2 continue statement

Causes the loop to skip the remainder of its body and immediately retest its
condition prior to reiterating.

3 pass statement

The pass statement in Python is used when a statement is required syntactically


but you do not want any command or code to execute.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 47

Page 50
Data Science using Python – Unit I

1..15 Number Functions In Python


Number data types store numeric values. They are immutable data
types, means that changing the value of a number data type results
in a newly allocated object.
Number objects are created when you assign a value to them. For
example −
var1 = 1
var2 = 10
You can also delete the reference to a number object by using
the del statement. The syntax of the del statement is −
del var1[,var2[,var3[....,varN]]]]
You can delete a single object or multiple objects by using
the del statement. For example −
del var
del var_a, var_b
Python supports four different numerical types −
 int (signed integers) − They are often called just integers or
ints, are positive or negative whole numbers with no decimal
point.
 long (long integers ) − Also called longs, they are integers of
unlimited size, written like integers and followed by an
uppercase or lowercase L.
 float (floating point real values) − Also called floats, they
represent real numbers and are written with a decimal point
dividing the integer and fractional parts. Floats may also be in
scientific notation, with E or e indicating the power of 10
(2.5e2 = 2.5 x 102 = 250).
 complex (complex numbers) − are of the form a + bJ, where
a and b are floats and J (or j) represents the square root of -1
(which is an imaginary number). The real part of the number
is a, and the imaginary part is b. Complex numbers are not
used much in Python programming.
Examples
Here are some examples of numbers

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 48

Page 51
Data Science using Python – Unit I

int Long Float complex

10 51924361L 0.0 3.14j

100 -0x19323L 15.20 45.j

-786 0122L -21.9 9.322e-36j

080 0xDEFABCECBDAECBFB 32.3+e18 .876j


AEL

-0490 535633629843L -90. -.6545+0J

-0x260 -052318172735L -32.54e100 3e+26J

0x69 -4721885298529L 70.2-E12 4.53e-7j

 Python allows you to use a lowercase L with long, but it is


recommended that you use only an uppercase L to avoid
confusion with the number 1. Python displays long integers
with an uppercase L.
 A complex number consists of an ordered pair of real floating
point numbers denoted by a + bj, where a is the real part and
b is the imaginary part of the complex number.

Number Type Conversion


Python converts numbers internally in an expression containing
mixed types to a common type for evaluation. But sometimes, you
need to coerce a number explicitly from one type to another to
satisfy the requirements of an operator or function parameter.
 Type int(x) to convert x to a plain integer.
 Type long(x) to convert x to a long integer.
 Type float(x) to convert x to a floating-point number.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 49

Page 52
Data Science using Python – Unit I

 Type complex(x) to convert x to a complex number with real


part x and imaginary part zero.
 Type complex(x, y) to convert x and y to a complex number
with real part x and imaginary part y. x and y are numeric
expressions

Mathematical Functions
Python includes following functions that perform mathematical
calculations.

Sr.No. Function & Returns ( description )

1 abs(x)

The absolute value of x: the (positive) distance between x


and zero.

2 ceil(x)

The ceiling of x: the smallest integer not less than x

3 cmp(x, y)

-1 if x < y, 0 if x == y, or 1 if x > y

4 exp(x)

The exponential of x: ex

5 fabs(x)

The absolute value of x.

6 floor(x)

The floor of x: the largest integer not greater than x

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 50

Page 53
Data Science using Python – Unit I

7 log(x)

The natural logarithm of x, for x> 0

8 log10(x)

The base-10 logarithm of x for x> 0.

9 max(x1, x2,...)

The largest of its arguments: the value closest to positive


infinity

10 min(x1, x2,...)

The smallest of its arguments: the value closest to


negative infinity

11 modf(x)

The fractional and integer parts of x in a two-item tuple.


Both parts have the same sign as x. The integer part is
returned as a float.

12 pow(x, y)

The value of x**y.

13 round(x [,n])

x rounded to n digits from the decimal point. Python


rounds away from zero as a tie-breaker: round(0.5) is 1.0
and round(-0.5) is -1.0.

14 sqrt(x)

The square root of x for x > 0

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 51

Page 54
Data Science using Python – Unit I

Random Number Functions


Random numbers are used for games, simulations, testing,
security, and privacy applications. Python includes following
functions that are commonly used.

Sr.No. Function & Description

1 choice(seq)

A random item from a list, tuple, or string.

2 randrange ([start,] stop [,step])

A randomly selected element from range(start, stop, step)

3 random()

A random float r, such that 0 is less than or equal to r and r


is less than 1

4 seed([x])

Sets the integer starting value used in generating random


numbers. Call this function before calling any other random
module function. Returns None.

5 shuffle(lst)

Randomizes the items of a list in place. Returns None.

6 uniform(x, y)

A random float r, such that x is less than or equal to r and r


is less than y

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 52

Page 55
Data Science using Python – Unit I

Trigonometric Functions
Python includes following functions that perform trigonometric
calculations.
Sr.No. Function & Description

1 acos(x)

Return the arc cosine of x, in radians.

2 asin(x)

Return the arc sine of x, in radians.

3 atan(x)

Return the arc tangent of x, in radians.

4 atan2(y, x)

Return atan(y / x), in radians.

5 cos(x)

Return the cosine of x radians.

6 hypot(x, y)

Return the Euclidean norm, sqrt(x*x + y*y).

7 sin(x)

Return the sine of x radians.

8 tan(x)

Return the tangent of x radians.

9 degrees(x)

Converts angle x from radians to degrees.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 53

Page 56
Data Science using Python – Unit I

1.16 Function In Pyhton


A function is a block of organized, reusable code that is used to
perform a single, related action. Functions provide better modularity for
your application and a high degree of code reusing.
As you already know, Python gives you many built-in functions like
print(), etc. but you can also create your own functions. These functions
are called user-defined functions.

Defining a Function
You can define functions to provide the required functionality. Here are
simple rules to define a function in Python.
 Function blocks begin with the keyword def followed by the
function name and parentheses ( ( ) ).
 Any input parameters or arguments should be placed within these
parentheses. You can also define parameters inside these
parentheses.
 The first statement of a function can be an optional statement - the
documentation string of the function or docstring.
 The code block within every function starts with a colon (:) and is
indented.
 The statement return [expression] exits a function, optionally
passing back an expression to the caller. A return statement with
no arguments is the same as return None.

Syntax

def functionname( parameters ):


"function_docstring"
function_suite
return [expression]

By default, parameters have a positional behavior and you need to


inform them in the same order that they were defined.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 54

Page 57
Data Science using Python – Unit I

Example
The following function takes a string as input parameter and prints it
on standard screen.
def printme( str ):
"This prints a passed string into this function"
print str
return

Calling a Function
Defining a function only gives it a name, specifies the parameters
that are to be included in the function and structures the blocks of
code.
Once the basic structure of a function is finalized, you can execute
it by calling it from another function or directly from the Python
prompt. Following is the example to call printme() function −
Live Demo

#!/usr/bin/python

# Function definition is here


def printme( str ):
"This prints a passed string into this function"
print str
return;

# Now you can call printme function


printme("I'm first call to user defined function!")
printme("Again second call to the same function")
When the above code is executed, it produces the following result −
I'm first call to user defined function!
Again second call to the same function

Pass by reference vs value


All parameters (arguments) in the Python language are passed by
reference. It means if you change what a parameter refers to within

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 55

Page 58
Data Science using Python – Unit I

a function, the change also reflects back in the calling function. For
example −
Live Demo

#!/usr/bin/python

# Function definition is here


def changeme( mylist ):
"This changes a passed list into this function"
mylist.append([1,2,3,4]);
print "Values inside the function: ", mylist
return

# Now you can call changeme function


mylist = [10,20,30];
changeme( mylist );
print "Values outside the function: ", mylist
Here, we are maintaining reference of the passed object and
appending values in the same object. So, this would produce the
following result −
Values inside the function: [10, 20, 30, [1, 2, 3, 4]]
Values outside the function: [10, 20, 30, [1, 2, 3, 4]]
There is one more example where argument is being passed by
reference and the reference is being overwritten inside the called
function.
Live Demo

#!/usr/bin/python

# Function definition is here


def changeme( mylist ):
"This changes a passed list into this function"
mylist = [1,2,3,4]; # This would assig new reference in mylist
print "Values inside the function: ", mylist
return

# Now you can call changeme function


mylist = [10,20,30];
changeme( mylist );
print "Values outside the function: ", mylist

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 56

Page 59
Data Science using Python – Unit I

The parameter mylist is local to the function changeme. Changing


mylist within the function does not affect mylist. The function
accomplishes nothing and finally this would produce the following
result −
Values inside the function: [1, 2, 3, 4]
Values outside the function: [10, 20, 30]

1..17 Function Arguments


You can call a function by using the following types of formal
arguments −
 Required arguments
 Keyword arguments
 Default arguments
 Variable-length arguments

Required arguments
Required arguments are the arguments passed to a function in
correct positional order. Here, the number of arguments in the
function call should match exactly with the function definition.
To call the function printme(), you definitely need to pass one
argument, otherwise it gives a syntax error as follows −
Live Demo

#!/usr/bin/python

# Function definition is here


def printme( str ):
"This prints a passed string into this function"
print str
return;

# Now you can call printme function


printme()
When the above code is executed, it produces the following result −
Traceback (most recent call last):

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 57

Page 60
Data Science using Python – Unit I

File "test.py", line 11, in <module>


printme();
TypeError: printme() takes exactly 1 argument (0 given)

Keyword arguments
Keyword arguments are related to the function calls. When you use
keyword arguments in a function call, the caller identifies the
arguments by the parameter name.
This allows you to skip arguments or place them out of order
because the Python interpreter is able to use the keywords
provided to match the values with parameters. You can also make
keyword calls to the printme() function in the following ways −
Live Demo

#!/usr/bin/python

# Function definition is here


def printme( str ):
"This prints a passed string into this function"
print str
return;

# Now you can call printme function


printme( str = "My string")
When the above code is executed, it produces the following result −
My string
The following example gives more clear picture. Note that the order
of parameters does not matter.
Live Demo

#!/usr/bin/python

# Function definition is here


def printinfo( name, age ):
"This prints a passed info into this function"
print "Name: ", name
print "Age ", age
return;

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 58

Page 61
Data Science using Python – Unit I

# Now you can call printinfo function


printinfo( age=50, name="miki" )
When the above code is executed, it produces the following result −
Name: miki
Age 50

Default arguments
A default argument is an argument that assumes a default value if a
value is not provided in the function call for that argument. The
following example gives an idea on default arguments, it prints
default age if it is not passed −
Live Demo

#!/usr/bin/python

# Function definition is here


def printinfo( name, age = 35 ):
"This prints a passed info into this function"
print "Name: ", name
print "Age ", age
return;

# Now you can call printinfo function


printinfo( age=50, name="miki" )
printinfo( name="miki" )
When the above code is executed, it produces the following result −
Name: miki
Age 50
Name: miki
Age 35

Variable-length arguments
You may need to process a function for more arguments than you
specified while defining the function. These arguments are
called variable-length arguments and are not named in the function
definition, unlike required and default arguments.
Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 59

Page 62
Data Science using Python – Unit I

Syntax for a function with non-keyword variable arguments is this −


def functionname([formal_args,] *var_args_tuple ):
"function_docstring"
function_suite
return [expression]
An asterisk (*) is placed before the variable name that holds the
values of all nonkeyword variable arguments. This tuple remains
empty if no additional arguments are specified during the function
call. Following is a simple example −
Live Demo

#!/usr/bin/python

# Function definition is here


def printinfo( arg1, *vartuple ):
"This prints a variable passed arguments"
print "Output is: "
print arg1
for var in vartuple:
print var
return;

# Now you can call printinfo function


printinfo( 10 )
printinfo( 70, 60, 50 )
When the above code is executed, it produces the following result −
Output is:
10
Output is:
70
60
50

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 60

Page 63
Data Science using Python – Unit II

UNIT - II

Introduction of Python

2.1 Python Modules


This tutorial will explain how to construct and import custom Python
modules. Additionally, we may import or integrate Python's built-in modules
via various methods.

What is Modular Programming?


Modular programming is the practice of segmenting a single,
complicated coding task into multiple, simpler, easier-to-manage sub-tasks.
We call these subtasks modules. Therefore, we can build a bigger program
by assembling different modules that act like building blocks.

Modularizing our code in a big application has a lot of benefits.

Simplification: A module often concentrates on one comparatively small


area of the overall problem instead of the full task. We will have a more
manageable design problem to think about if we are only concentrating on
one module. Program development is now simpler and much less
vulnerable to mistakes.

Flexibility: Modules are frequently used to establish conceptual


separations between various problem areas. It is less likely that changes to
one module would influence other portions of the program if modules are
constructed in a fashion that reduces interconnectedness. (We might even
be capable of editing a module despite being familiar with the program
Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 61

Page 64
Data Science using Python – Unit II

beyond it.) It increases the likelihood that a group of numerous developers


will be able to collaborate on a big project.

Reusability: Functions created in a particular module may be readily


accessed by different sections of the assignment (through a suitably
established api). As a result, duplicate code is no longer necessary.

Scope: Modules often declare a distinct namespace to prevent identifier


clashes in various parts of a program.

In Python, modularization of the code is encouraged through the use


of functions, modules, and packages.

What are Modules in Python?


A document with definitions of functions and various statements
written in Python is called a Python module.

In Python, we can define a module in one of 3 ways:

o Python itself allows for the creation of modules.

o Similar to the re (regular expression) module, a module can be


primarily written in C programming language and then dynamically
inserted at run-time.

o A built-in module, such as the itertools module, is inherently included


in the interpreter.

A module is a file containing Python code, definitions of functions,


statements, or classes. An example_module.py file is a module we will
create and whose name is example_module.
Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 62

Page 65
Data Science using Python – Unit II

We employ modules to divide complicated programs into smaller,


more understandable pieces. Modules also allow for the reuse of code.

Rather than duplicating their definitions into several applications, we


may define our most frequently used functions in a separate module and
then import the complete module.

Let's construct a module. Save the file as example_module.py after


entering the following.

Code:
1. # Python program to show how to create a module.

2. # defining a function in the module to reuse it

3. def square( number ):

4. """This function will square the number passed to it"""

5. result = number ** 2

6. return result

Here, a module called example_module contains the definition of the


function square(). The function returns the square of a given number.

How to Import Modules in Python?


In Python, we may import functions from one module into our
program, or as we say into, another module.

For this, we make use of the import Python keyword. In the Python
window, we add the next to import keyword, the name of the module we

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 63

Page 66
Data Science using Python – Unit II

need to import. We will import the module we defined earlier


example_module.

Code:

1. import example_module

The functions that we defined in the example_module are not


immediately imported into the present program. Only the name of the
module, i.e., example_ module, is imported here.

We may use the dot operator to use the functions using the module
name. For instance:

Code:
1. result = example_module.square( 4 )

2. print( "By using the module square of number is: ", result )

Output:
By using the module square of number is: 16

There are several standard modules for Python. The complete list of
Python standard modules is available. The list can be seen using the help
command.

Similar to how we imported our module, a user-defined module, we


can use an import statement to import other standard modules.

Importing a module can be done in a variety of ways. Below is a list of


them.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 64

Page 67
Data Science using Python – Unit II

Python import Statement


Using the import Python keyword and the dot operator, we may
import a standard module and can access the defined functions within it.
Here's an illustration.

Code:
1. # Python program to show how to import a standard module

2. # We will import the math module which is a standard module

3. import math

4. print( "The value of euler's number is", math.e )

Output:
The value of euler's number is 2.718281828459045

Importing and also Renaming


While importing a module, we can change its name too. Here is an
example to show.

Code:
1. # Python program to show how to import a module and rename it

2. # We will import the math module and give a different name to it

3. import math as mt

4. print( "The value of euler's number is", mt.e )

Output:
The value of euler's number is 2.718281828459045

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 65

Page 68
Data Science using Python – Unit II

The math module is now named mt in this program. In some


circumstances, it might help us type faster in case of modules having long
names.

Please take note that now the scope of our program does not include
the term math. Thus, mt.pi is the proper implementation of the module,
whereas math.pi is invalid.

Python from...import Statement


We can import specific names from a module without importing the
module as a whole. Here is an example.

Code:
1. # Python program to show how to import specific objects from a module

2. # We will import euler's number from the math module using the from keyword

3. from math import e

4. print( "The value of euler's number is", e )

Output:
The value of euler's number is 2.718281828459045

Only the e constant from the math module was imported in this case.

We avoid using the dot (.) operator in these scenarios. As follows, we


may import many attributes at the same time:

Code:
1. # Python program to show how to import multiple objects from a module

2. from math import e, tau

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 66

Page 69
Data Science using Python – Unit II

3. print( "The value of tau constant is: ", tau )

4. print( "The value of the euler's number is: ", e )

Output:
The value of tau constant is: 6.283185307179586

The value of the euler's number is: 2.718281828459045

Import all Names - From import * Statement


To import all the objects from a module within the present
namespace, use the * symbol and the from and import keyword.

Syntax:

1. from name_of_module import *

There are benefits and drawbacks to using the symbol *. It is not


advised to use * unless we are certain of our particular requirements from
the module; otherwise, do so.

Here is an example of the same.

Code:
1. # importing the complete math module using *

2. from math import *

3. # accessing functions of math module without using the dot operator

4. print( "Calculating square root: ", sqrt(25) )

5. print( "Calculating tangent of an angle: ", tan(pi/6) ) # here pi is also imported from the m
ath module

Output:

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 67

Page 70
Data Science using Python – Unit II

Calculating square root: 5.0

Calculating tangent of an angle: 0.5773502691896257

Locating Path of Modules


The interpreter searches numerous places when importing a module
in the Python program. Several directories are searched if the built-in
module is not present. The list of directories can be accessed using
sys.path. The Python interpreter looks for the module in the way described
below:

The module is initially looked for in the current working directory.


Python then explores every directory in the shell parameter PYTHONPATH
if the module cannot be located in the current directory. A list of folders
makes up the environment variable known as PYTHONPATH. Python
examines the installation-dependent set of folders set up when Python is
downloaded if that also fails.

Here is an example to print the path.

Code:
1. # We will import the sys module

2. import sys

3. # we will import sys.path

4. print(sys.path)

Output:
['/home/pyodide', '/home/pyodide/lib/Python310.zip', '/lib/Python3.10', '/lib/Python3.10/lib-
dynload', '', '/lib/Python3.10/site-packages']

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 68

Page 71
Data Science using Python – Unit II

The dir() Built-in Function


We may use the dir() method to identify names declared within a
module.

For instance, we have the following names in the standard module


str. To print the names, we will use the dir() method in the following way:

Code:
1. # Python program to print the directory of a module

2. print( "List of functions:\n ", dir( str ), end=", " )

Output:
List of functions:

['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__',


'__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__',
'__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__',
'__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__',
'__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count',
'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha',
'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle',
'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix',
'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip',
'swapcase', 'title', 'translate', 'upper', 'zfill']

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 69

Page 72
Data Science using Python – Unit II

Namespaces and Scoping


Objects are represented by names or identifiers called variables. A
namespace is a dictionary containing the names of variables (keys) and the
objects that go with them (values).

Both local and global namespace variables can be accessed by a


Python statement. When two variables with the same name are local and
global, the local variable takes the role of the global variable. There is a
separate local namespace for every function. The scoping rule for class
methods is the same as for regular functions. Python determines if
parameters are local or global based on reasonable predictions. Any
variable that is allocated a value in a method is regarded as being local.

Therefore, we must use the global statement before we may provide


a value to a global variable inside of a function. Python is informed that
Var_Name is a global variable by the line global Var_Name. Python stops
looking for the variable inside the local namespace.

We declare the variable Number, for instance, within the global


namespace. Since we provide a Number a value inside the function,
Python considers a Number to be a local variable. UnboundLocalError will
be the outcome if we try to access the value of the local variable without or
before declaring it global.

Code:
1. Number = 204

2. def AddNumber():

3. # accessing the global namespace

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 70

Page 73
Data Science using Python – Unit II

4. global Number

5. Number = Number + 200

6. print( Number )

7. AddNumber()

8. print( Number )

Output:
204

404

Python Packages:
We usually organize our files in different folders and subfolders based
on some criteria, so that they can be managed easily and efficiently. For
example, we keep all our games in a Games folder and we can even
subcategorize according to the genre of the game or something like this.
The same analogy is followed by the Python package.

A Python module may contain several classes, functions, variables,


etc. whereas a Python package can contains several module. In simpler
terms a package is folder that contains various modules as files.

Creating Package
Let’s create a package named mypckg that will contain two modules
mod1 and mod2. To create this module follow the below steps –

Create a folder named mypckg.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 71

Page 74
Data Science using Python – Unit II

Inside this folder create an empty Python file i.e. __init__.py

Then create two modules mod1 and mod2 in this folder.

Mod1.py

def gfg():

print("Welcome to GFG")

The hierarchy of the our package looks like this –

mypckg

---__init__.py

---mod1.py

---mod2.py

Understanding __init__.py

__init__.py helps the Python interpreter to recognise the folder as


package. It also specifies the resources to be imported from the modules. If
the __init__.py is empty this means that all the functions of the modules will

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 72

Page 75
Data Science using Python – Unit II

be imported. We can also specify the functions from each module to be


made available.

For example, we can also create the __init__.py file for the above module
as –

__init__.py

from .mod1 import gfg

from .mod2 import sum

This __init__.py will only allow the gfg and sum functions from the
mod1 and mod2 modules to be imported.

Import Modules from a Package


We can import these modules using the from…import statement and
the dot(.) operator.

Syntax:

import package_name.module_name

Example: Import Module from package

We will import the modules from the above created package and will
use the functions inside those modules.

from mypckg import mod1

from mypckg import mod2

mod1.gfg()

res = mod2.sum(1, 2)

print(res)

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 73

Page 76
Data Science using Python – Unit II

Output:
Welcome to GFG

We can also import the specific function also using the same syntax.

Example: Import Specific function from the module


from mypckg.mod1 import gfg

from mypckg.mod2 import sum

gfg()

res = sum(1, 2)

print(res)

Output:
Welcome to GFG

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 74

Page 77
Data Science using Python – Unit II

2.2 Python File Handling


Till now, we were taking the input from the console and writing it back
to the console to interact with the user.

Sometimes, it is not enough to only display the data on the console.


The data to be displayed may be very large, and only a limited amount of
data can be displayed on the console since the memory is volatile, it is
impossible to recover the programmatically generated data again and
again.

The file handling plays an important role when the data needs to be
stored permanently into the file. A file is a named location on disk to store
related information. We can access the stored information (non-volatile)
after the program termination.

The file-handling implementation is slightly lengthy or complicated in


the other programming language, but it is easier and shorter in Python.

In Python, files are treated in two modes as text or binary. The file
may be in the text or binary format, and each line of a file is ended with the
special character.

Hence, a file operation can be done in the following order.

o Open a file

o Read or write - Performing operation

o Close the file

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 75

Page 78
Data Science using Python – Unit II

Opening a file
Python provides an open() function that accepts two arguments, file
name and access mode in which the file is accessed. The function returns
a file object which can be used to perform various operations like reading,
writing, etc.

Syntax:

1. file object = open(<file-name>, <access-mode>, <buffering>)

The files can be accessed using various modes like read, write, or
append. The following are the details about the access mode to open a file.

Access
SN Description
mode
It opens the file to read-only mode. The file pointer exists at the beginning.
1 r
The file is by default open in this mode if no access mode is passed.

It opens the file to read-only in binary format. The file pointer exists at the
2 rb
beginning of the file.

It opens the file to read and write both. The file pointer exists at the
3 r+
beginning of the file.

It opens the file to read and write both in binary format. The file pointer
4 rb+
exists at the beginning of the file.

It opens the file to write only. It overwrites the file if previously exists or
5 w creates a new one if no file exists with the same name. The file pointer
exists at the beginning of the file.
It opens the file to write only in binary format. It overwrites the file if it
6 wb exists previously or creates a new one if no file exists. The file pointer
exists at the beginning of the file.
It opens the file to write and read both. It is different from r+ in the sense
that it overwrites the previous file if one exists whereas r+ doesn't
7 w+
overwrite the previously written file. It creates a new file if no file exists.
The file pointer exists at the beginning of the file.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 76

Page 79
Data Science using Python – Unit II

It opens the file to write and read both in binary format. The file pointer
8 wb+
exists at the beginning of the file.

It opens the file in the append mode. The file pointer exists at the end of
9 a the previously written file if exists any. It creates a new file if no file exists
with the same name.
It opens the file in the append mode in binary format. The pointer exists at
10 ab the end of the previously written file. It creates a new file in binary format if
no file exists with the same name.
It opens a file to append and read both. The file pointer remains at the end
11 a+ of the file if a file exists. It creates a new file if no file exists with the same
name.

It opens a file to append and read both in binary format. The file pointer
12 ab+
remains at the end of the file.

Let's look at the simple example to open a file named "file.txt" (stored
in the same directory) in read mode and printing its content on the console.

Example:
1. #opens the file file.txt in read mode

2. fileptr = open("file.txt","r")

3.

4. if fileptr:

5. print("file is opened successfully")

Output:
<class '_io.TextIOWrapper'>

file is opened successfully

In the above code, we have passed filename as a first argument and


opened file in read mode as we mentioned r as the second argument.
The fileptr holds the file object and if the file is opened successfully, it will
execute the print statement

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 77

Page 80
Data Science using Python – Unit II

The close() method


Once all the operations are done on the file, we must close it through
our Python script using the close() method. Any unwritten information gets
destroyed once the close() method is called on a file object.

We can perform any operation on the file externally using the file
system which is the currently opened in Python; hence it is good practice to
close the file once all the operations are done.

The syntax to use the close() method is given below.

Syntax:
1. fileobject.close()

2. Consider the following example.

3. # opens the file file.txt in read mode

4. fileptr = open("file.txt","r")

5.

6. if fileptr:

7. print("file is opened successfully")

8.

9. #closes the opened file

10. fileptr.close()

After closing the file, we cannot perform any operation in the file. The
file needs to be properly closed. If any exception occurs while performing
some operations in the file then the program terminates without closing the
file.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 78

Page 81
Data Science using Python – Unit II

We should use the following method to overcome such type of


problem.
1. try:

2. fileptr = open("file.txt")

3. # perform file operations

4. finally:

5. fileptr.close()

The with statement


The with statement was introduced in python 2.5. The with statement
is useful in the case of manipulating the files. It is used in the scenario
where a pair of statements is to be executed with a block of code in
between.

The syntax to open a file using with the statement is given below.
1. with open(<file name>, <access mode>) as <file-pointer>:

2. #statement suite

The advantage of using with statement is that it provides the


guarantee to close the file regardless of how the nested block exits.

It is always suggestible to use the with statement in the case of files


because, if the break, return, or exception occurs in the nested block of
code then it automatically closes the file, we don't need to write
the close() function. It doesn't let the file to corrupt.

Consider the following example.

Example
Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 79

Page 82
Data Science using Python – Unit II

1. with open("file.txt",'r') as f:

2. content = f.read();

3. print(content)

Writing the file


To write some text to a file, we need to open the file using the open
method with one of the following access modes.

w: It will overwrite the file if any file exists. The file pointer is at the
beginning of the file.

a: It will append the existing file. The file pointer is at the end of the file. It
creates a new file if no file exists.

Consider the following example.

Example:
1. # open the file.txt in append mode. Create a new file if no such file exists.

2. fileptr = open("file2.txt", "w")

3. # appending the content to the file

4. fileptr.write('''''Python is the modern day language. It makes things so simple.

5. It is the fastest-growing programing language''')

6. # closing the opened the file

7. fileptr.close()

Output:
File2.txt

Python is the modern-day language. It makes things so simple. It is


the fastest growing programming language.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 80

Page 83
Data SScience using Python – Unit II

Snapshot of the file2.txt

We have opened the file in w mode. The file1.txt file doesn't exist, it
created a new file and we have written the content in the file using
the write() function.

Example 2:
1. #open the file.txt in write mode.

2. fileptr = open("file2.txt","a")

3. #overwriting the content of the file

4. fileptr.write(" Python has an easy syntax and user-friendly interaction.")

5. #closing the opened file

6. fileptr.close()

Output:
Python is the modern day language. It makes things
things so simple.

It is the fastest growing programing language Python has an easy


syntax and user-friendly
friendly interaction.

Snapshot of the file2.txt

We can see that the content of the file is modified. We have opened

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 81

Page 84
Data Science using Python – Unit II

the file in a mode and it appended the content in the existing file2.txt.

To read a file using the Python script, the Python provides


the read() method. The read() method reads a string from the file. It can
read the data in the text as well as a binary format.

The syntax of the read() method is given below.

Syntax:
1. fileobj.read(<count>)

Here, the count is the number of bytes to be read from the file starting
from the beginning of the file. If the count is not specified, then it may read
the content of the file until the end.

Consider the following example.

Example
1. #open the file.txt in read mode. causes error if no such file exists.

2. fileptr = open("file2.txt","r")

3. #stores all the data of the file into the variable content

4. content = fileptr.read(10)

5. # prints the type of the data stored in the file

6. print(type(content))

7. #prints the content of the file

8. print(content)

9. #closes the opened file

10. fileptr.close()

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 82

Page 85
Data Science using Python – Unit II

Output:
<class 'str'>
Python is

In the above code, we have read the content of file2.txt by using


the read() function. We have passed count value as ten which means it will
read the first ten characters from the file.

If we use the following line, then it will print all content of the file.
1. content = fileptr.read()

2. print(content)

Output:
Python is the modern-day language. It makes things so simple.
It is the fastest-growing programing language Python has easy an
syntax and user-friendly interaction.

Read file through for loop


We can read the file using for loop. Consider the following example.
1. #open the file.txt in read mode. causes an error if no such file exists.

2. fileptr = open("file2.txt","r");

3. #running a for loop

4. for i in fileptr:

5. print(i) # i contains each line of the file

Output:
Python is the modern day language.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 83

Page 86
Data Science using Python – Unit II

It makes things so simple.


Python has easy syntax and user-friendly interaction.

Read Lines of the file


Python facilitates to read the file line by line by using a
function readline() method. The readline() method reads the lines of the
file from the beginning, i.e., if we use the readline() method two times, then
we can get the first two lines of the file.

Consider the following example which contains a


function readline() that reads the first line of our file "file2.txt" containing
three lines. Consider the following example.

Example 1: Reading lines using readline() function


1. #open the file.txt in read mode. causes error if no such file exists.

2. fileptr = open("file2.txt","r");

3. #stores all the data of the file into the variable content

4. content = fileptr.readline()

5. content1 = fileptr.readline()

6. #prints the content of the file

7. print(content)

8. print(content1)

9. #closes the opened file

10. fileptr.close()

Output:
Python is the modern day language.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 84

Page 87
Data Science using Python – Unit II

It makes things so simple.

We called the readline() function two times that's why it read two
lines from the file.

Python provides also the readlines() method which is used for the
reading lines. It returns the list of the lines till the end of file(EOF) is
reached.

Example 2: Reading Lines Using readlines() function


1. #open the file.txt in read mode. causes error if no such file exists.

2. fileptr = open("file2.txt","r");

3.

4. #stores all the data of the file into the variable content

5. content = fileptr.readlines()

6.

7. #prints the content of the file

8. print(content)

9.

10. #closes the opened file

11. fileptr.close()

Output:
['Python is the modern day language.\n', 'It makes things so
simple.\n', 'Python has easy syntax and user-friendly
interaction.']

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 85

Page 88
Data Science using Python – Unit II

Creating a new file


The new file can be created by using one of the following access
modes with the function open().

x: it creates a new file with the specified name. It causes an error a


file exists with the same name.

a: It creates a new file with the specified name if no such file exists. It
appends the content to the file if the file already exists with the specified
name.

w: It creates a new file with the specified name if no such file exists.
It overwrites the existing file.

Consider the following example.

Example 1

1. #open the file.txt in read mode. causes error if no such file exists.

2. fileptr = open("file2.txt","x")

3. print(fileptr)

4. if fileptr:

5. print("File created successfully")

Output:
<_io.TextIOWrapper name='file2.txt' mode='x' encoding='cp1252'>

File created successfully

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 86

Page 89
Data Science using Python – Unit II

File Pointer positions


Python provides the tell() method which is used to print the byte
number at which the file pointer currently exists. Consider the following
example.
1. # open the file file2.txt in read mode

2. fileptr = open("file2.txt","r")

3. #initially the filepointer is at 0

4. print("The filepointer is at byte :",fileptr.tell())

5. #reading the content of the file

6. content = fileptr.read();

7. #after the read operation file pointer modifies. tell() returns the location of the fileptr.

8. print("After reading, the filepointer is at:",fileptr.tell())

Output:
The filepointer is at byte : 0

After reading, the filepointer is at: 117

Modifying file pointer position


In real-world applications, sometimes we need to change the file
pointer location externally since we may need to read or write the content
at various locations.

For this purpose, the Python provides us the seek() method which
enables us to modify the file pointer position externally.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 87

Page 90
Data Science using Python – Unit II

The syntax to use the seek() method is given below.

Syntax:
1. <file-ptr>.seek(offset[, from)

The seek() method accepts two parameters:

offset: It refers to the new position of the file pointer within the file.

from: It indicates the reference position from where the bytes are to be
moved. If it is set to 0, the beginning of the file is used as the reference
position. If it is set to 1, the current position of the file pointer is used as the
reference position. If it is set to 2, the end of the file pointer is used as the
reference position.

Consider the following example.

Example
1. # open the file file2.txt in read mode
2. fileptr = open("file2.txt","r")
3. #initially the filepointer is at 0
4. print("The filepointer is at byte :",fileptr.tell())
5. #changing the file pointer location to 10.
6. fileptr.seek(10);
7. #tell() returns the location of the fileptr.

8. print("After reading, the filepointer is at:",fileptr.tell())

Output:
The filepointer is at byte : 0

After reading, the filepointer is at: 10

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 88

Page 91
Data Science using Python – Unit II

2.3 Creating the new directory


The mkdir() method is used to create the directories in the current
working directory. The syntax to create the new directory is given below.

Syntax:
1. mkdir(directory name)

Example 1
1. import os

2. #creating a new directory with the name new

3. os.mkdir("new")

The getcwd() method


This method returns the current working directory.

The syntax to use the getcwd() method is given below.

Syntax:
1. os.getcwd()

Example
1. import os

2. os.getcwd()

Output:
'C:\\Users\\DEVANSH SHARMA'

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 89

Page 92
Data Science using Python – Unit II

Changing the current working directory


The chdir() method is used to change the current working directory to
a specified directory.

The syntax to use the chdir() method is given below.

Syntax:
1. chdir("new-directory")

Example
1. import os

2. # Changing current directory with the new directiory

3. os.chdir("C:\\Users\\DEVANSH SHARMA\\Documents")

4. #It will display the current working directory

5. os.getcwd()

Output:

Deleting directory
The rmdir() method is used to delete the specified directory.

The syntax to use the rmdir() method is given below.

Syntax:
1. os.rmdir(directory name)

Example 1
1. import os

2. #removing the new directory

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 90

Page 93
Data Science using Python – Unit II

3. os.rmdir("directory_name")

It will remove the specified directory.

Writing Python output to the files


In Python, there are the requirements to write the output of a Python script
to a file.

The check_call() method of module subprocess is used to execute a


Python script and write the output of that script to a file.

The following example contains two python scripts. The script file1.py
executes the script file.py and writes its output to the text file output.txt.

Example

file.py

1. temperatures=[10,-20,-289,100]

2. def c_to_f(c):

3. if c< -273.15:

4. return "That temperature doesn't make sense!"

5. else:

6. f=c*9/5+32

7. return f

8. for t in temperatures:

9. print(c_to_f(t))

file.py

1. import subprocess

2. with open("output.txt", "wb") as f:

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 91

Page 94
Data Science using Python – Unit II

3. subprocess.check_call(["python", "file.py"], stdout=f)

The file related methods


The file object provides the following methods to manipulate the files
on various operating systems.

SN Method Description

It closes the opened file. The file once closed, it can't be


1 file.close()
read or write anymore.

2 File.fush() It flushes the internal buffer.

It returns the file descriptor used by the underlying


3 File.fileno()
implementation to request I/O from the OS.
It returns true if the file is connected to a TTY device,
4 File.isatty()
otherwise returns false.

5 File.next() It returns the next line from the file.

6 File.read([size]) It reads the file for the specified size.

It reads one line from the file and places the file pointer to
7 File.readline([size])
the beginning of the new line.
It returns a list containing all the lines of the file. It reads the
8 File.readlines([sizehint])
file until the EOF occurs using readline() function.
It modifies the position of the file pointer to a specified offset
9 File.seek(offset[,from)
with the specified reference.

10 File.tell() It returns the current position of the file pointer within the file.

11 File.truncate([size]) It truncates the file to the optional specified size.

12 File.write(str) It writes the specified string to a file

13 File.writelines(seq) It writes a sequence of the strings to a file.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 92

Page 95
Data Science using Python – Unit II

2.4 Python Exceptions


When a Python program meets an error, it stops the execution of the
rest of the program. An error in Python might be either an error in the
syntax of an expression or a Python exception. We will see what an
exception is. Also, we will see the difference between a syntax error and an
exception in this tutorial. Following that, we will learn about trying and
except blocks and how to raise exceptions and make assertions. After that,
we will see the Python exceptions list.

What is an Exception?
An exception in Python is an incident that happens while executing a
program that causes the regular course of the program's commands to be
disrupted. When a Python code comes across a condition it can't handle, it
raises an exception. An object in Python that describes an error is called an
exception.

When a Python code throws an exception, it has two options: handle


the exception immediately or stop and quit.

Exceptions versus Syntax Errors


When the interpreter identifies a statement that has an error, syntax errors
occur. Consider the following scenario:

Code:

1. #Python code after removing the syntax error

2. string = "Python Exceptions"

3.

4. for s in string:

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 93

Page 96
Data Science using Python – Unit II

5. if (s != o:

6. print( s )

Output:

if (s != o:
^
SyntaxError: invalid syntax

The arrow in the output shows where the interpreter encountered a


syntactic error. There was one unclosed bracket in this case. Close it and
rerun the program:

Code:

1. #Python code after removing the syntax error

2. string = "Python Exceptions"

3.

4. for s in string:

5. if (s != o):

6. print( s )

Output:

2 string = "Python Exceptions"


4 for s in string:
----> 5 if (s != o):
6 print( s )

NameError: name 'o' is not defined

We encountered an exception error after executing this code. When


syntactically valid Python code produces an error, this is the kind of error
that arises. The output's last line specified the name of the exception error
code encountered. Instead of displaying just "exception error", Python
Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 94

Page 97
Data Science using Python – Unit II

displays information about the sort of exception error that occurred. It was a
NameError in this situation. Python includes several built-in exceptions.
However, Python offers the facility to construct custom exceptions.

Try and Except Statement - Catching Exceptions


In Python, we catch exceptions and handle them using try and except
code blocks. The try clause contains the code that can raise an exception,
while the except clause contains the code lines that handle the exception.
Let's see if we can access the index from the array, which is more than the
array's length, and handle the resulting exception.

Code:

1. # Python code to catch an exception and handle it using try and except code blocks
2. a = ["Python", "Exceptions", "try and except"]
3. try:
4. #looping through the elements of the array a, choosing a range that goes beyond the l
ength of the array
5. for i in range( 4 ):
6. print( "The index and element from the array is", i, a[i] )
7. #if an error occurs in the try block, then except block will be executed by the Python inter
preter
8. except:
9. print ("Index out of range")

Output:

The index and element from the array is 0 Python


The index and element from the array is 1 Exceptions
The index and element from the array is 2 try and except
Index out of range

The code blocks that potentially produce an error are inserted inside
the try clause in the preceding example. The value of i greater than 2

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 95

Page 98
Data Science using Python – Unit II

attempts to access the list's item beyond its length, which is not present,
resulting in an exception. The except clause then catches this exception
and executes code without stopping it.

How to Raise an Exception


If a condition does not meet our criteria but is correct according to the
Python interpreter, we can intentionally raise an exception using the raise
keyword. We can use a customized exception in conjunction with the
statement.

If we wish to use raise to generate an exception when a given


condition happens, we may do so as follows:

Code:

1. #Python code to show how to raise an exception in Python

2. num = [3, 4, 5, 7]

3. if len(num) > 3:

4. raise Exception( f"Length of the given list must be less than or equal to 3 but is {len(nu
m)}" )

Output:
1 num = [3, 4, 5, 7]
2 if len(num) > 3:
----> 3 raise Exception( f"Length of the given list must be
less than or equal to 3 but is {len(num)}" )

Exception: Length of the given list must be less than or equal


to 3 but is 4

The implementation stops and shows our exception in the output,


providing indications as to what went incorrect.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 96

Page 99
Data Science using Python – Unit II

2.5 Python OOPs Concepts


Like other general-purpose programming languages, Python is also
an object-oriented language since its beginning. It allows us to develop
applications using an Object-Oriented approach. In Python, we can easily
create and use classes and objects.

An object-oriented paradigm is to design the program using classes


and objects. The object is related to real-word entities such as book, house,
pencil, etc. The oops concept focuses on writing the reusable code. It is a
widespread technique to solve the problem by creating objects.

Major principles of object-oriented programming system are given


below.

o Class
o Object
o Method
o Inheritance
o Polymorphism
o Data Abstraction
o Encapsulation

Class
The class can be defined as a collection of objects. It is a logical
entity that has some specific attributes and methods. For example: if you
have an employee class, then it should contain an attribute and method,
i.e. an email id, name, age, salary, etc.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 97

Page 100
Data Science using Python – Unit II

1. class ClassName:

2. <statement-1>

3. <statement-N>

Object
The object is an entity that has state and behavior. It may be any
real-world object like the mouse, keyboard, chair, table, pen, etc.

Everything in Python is an object, and almost everything has


attributes and methods. All functions have a built-in attribute __doc__,
which returns the docstring defined in the function source code.

When we define a class, it needs to create an object to allocate the


memory. Consider the following example.

Example:
1. class car:

2. def __init__(self,modelname, year):

3. self.modelname = modelname

4. self.year = year

5. def display(self):

6. print(self.modelname,self.year)

7. c1 = car("Toyota", 2016)

8. c1.display()

Output:
Toyota 2016

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 98

Page 101
Data Science using Python – Unit II

In the above example, we have created the class named car, and it
has two attributes modelname and year. We have created a c1 object to
access the class attribute. The c1 object will allocate memory for these
values. We will learn more about class and object in the next tutorial.

Method
The method is a function that is associated with an object. In Python,
a method is not unique to class instances. Any object type can have
methods.

Inheritance
Inheritance is the most important aspect of object-oriented
programming, which simulates the real-world concept of inheritance. It
specifies that the child object acquires all the properties and behaviors of
the parent object.

By using inheritance, we can create a class which uses all the


properties and behavior of another class. The new class is known as a
derived class or child class, and the one whose properties are acquired is
known as a base class or parent class.

It provides the re-usability of the code.

Polymorphism
Polymorphism contains two words "poly" and "morphs". Poly means
many, and morph means shape. By polymorphism, we understand that one
task can be performed in different ways. For example - you have a class
animal, and all animals speak. But they speak differently. Here, the "speak"

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 99

Page 102
Data Science using Python – Unit II

behavior is polymorphic in a sense and depends on the animal. So, the


abstract "animal" concept does not actually "speak", but specific animals
(like dogs and cats) have a concrete implementation of the action "speak".

Encapsulation
Encapsulation is also an essential aspect of object-oriented
programming. It is used to restrict access to methods and variables. In
encapsulation, code and data are wrapped together within a single unit
from being modified by accident.

Data Abstraction
Data abstraction and encapsulation both are often used as
synonyms. Both are nearly synonyms because data abstraction is achieved
through encapsulation.

Abstraction is used to hide internal details and show only


functionalities. Abstracting something means to give names to things so that
the name captures the core of what a function or a whole program does.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 100

Page 103
Data Science using Python – Unit II

2.6 Python Class and Objects


We have already discussed in previous tutorial, a class is a virtual
entity and can be seen as a blueprint of an object. The class came into
existence when it instantiated. Let's understand it by an example.

Suppose a class is a prototype of a building. A building contains all


the details about the floor, rooms, doors, windows, etc. we can make as
many buildings as we want, based on these details. Hence, the building
can be seen as a class, and we can create as many objects of this class.

On the other hand, the object is the instance of a class. The process
of creating an object can be called instantiation.

In this section of the tutorial, we will discuss creating classes and


objects in Python. We will also discuss how a class attribute is accessed by
using the object.

Creating classes in Python

In Python, a class can be created by using the keyword class,


followed by the class name. The syntax to create a class is given below.

Syntax:
1. class ClassName:
2. #statement_suite

In Python, we must notice that each class is associated with a


documentation string which can be accessed by using <class-

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 101

Page 104
Data Science using Python – Unit II

name>.__doc__. A class contains a statement suite including fields,


constructor, function, etc. definition.

Consider the following example to create a class Employee which


contains two fields as Employee id, and name.

The class also contains a function display(), which is used to display


the information of the Employee.

Example
1. class Employee:

2. id = 10

3. name = "Devansh"

4. def display (self):

5. print(self.id,self.name)

Here, the self is used as a reference variable, which refers to the


current class object. It is always the first argument in the function definition.
However, using self is optional in the function call.

The self-parameter
The self-parameter refers to the current instance of the class and
accesses the class variables. We can use anything instead of self, but it
must be the first parameter of any function which belongs to the class.

Creating an instance of the class

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 102

Page 105
Data Science using Python – Unit II

A class needs to be instantiated if we want to use the class attributes


in another class or method. A class can be instantiated by calling the class
using the class name.

The syntax to create the instance of the class is given below.


1. <object-name> = <class-name>(<arguments>)

The following example creates the instance of the class Employee


defined in the above example.

Example
1. class Employee:

2. id = 10

3. name = "John"

4. def display (self):

5. print("ID: %d \nName: %s"%(self.id,self.name))

6. # Creating a emp instance of Employee class

7. emp = Employee()

8. emp.display()

Output:
ID: 10
Name: John

In the above code, we have created the Employee class which has
two attributes named id and name and assigned value to them. We can

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 103

Page 106
Data Science using Python – Unit II

observe we have passed the self as parameter in display function. It is


used to refer to the same class attribute.

We have created a new instance object named emp. By using it, we


can access the attributes of the class.

Delete the Object

We can delete the properties of the object or object itself by using the
del keyword. Consider the following example.

Example
1. class Employee:

2. id = 10

3. name = "John"

4. def display(self):

5. print("ID: %d \nName: %s" % (self.id, self.name))

6. # Creating a emp instance of Employee class

7. emp = Employee()

8. # Deleting the property of object

9. del emp.id

10. # Deleting the object itself

11. del emp

12. emp.display()

It will through the Attribute error because we have deleted the object emp.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 104

Page 107
Data SScience using Python – Unit II

2.7 Python Constructor

A constructor is a special type of method (function) which is used to


initialize the instance members of the class.

In C++ or Java, the constructor has the same name as its class, but it
treats constructor differently in Python. It is used to create an object.

Constructors can be of two types.

1. Parameterized Constructor

2. Non-parameterized
parameterized Constructor

Constructor definition is executed when we create the object of this


class. Constructors also verify that there are enough resources for the
object to perform any start-up
start task.

Creating the constructor in python


In Python, the method the __init__() simulates the constructor of the
class. This method is called when the class is instantiated. It accepts

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 105

Page 108
Data Science using Python – Unit II

the self-keyword as a first argument which allows accessing the attributes


or method of the class.

We can pass any number of arguments at the time of creating the


class object, depending upon the __init__() definition. It is mostly used to
initialize the class attributes. Every class must have a constructor, even if it
simply relies on the default constructor.

Consider the following example to initialize the Employee class attributes.

Example
1. class Employee:

2. def __init__(self, name, id):

3. self.id = id

4. self.name = name

5.

6. def display(self):

7. print("ID: %d \nName: %s" % (self.id, self.name))

8.

9.

10. emp1 = Employee("John", 101)

11. emp2 = Employee("David", 102)

12.

13. # accessing display() method to print employee 1 information

14.

15. emp1.display()

16.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 106

Page 109
Data Science using Python – Unit II

17. # accessing display() method to print employee 2 information

18. emp2.display()

Output:
ID: 101
Name: John
ID: 102
Name: David

Counting the number of objects of a class


The constructor is called automatically when we create the object of
the class. Consider the following example.

Example
1. class Student:

2. count = 0

3. def __init__(self):

4. Student.count = Student.count + 1

5. s1=Student()

6. s2=Student()

7. s3=Student()

8. print("The number of students:",Student.count)

Output:
The number of students: 3

Python Non-Parameterized Constructor


The non-parameterized constructor uses when we do not want to
manipulate the value or the constructor that has only self as an argument.
Consider the following example.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 107

Page 110
Data Science using Python – Unit II

Example
1. class Student:

2. # Constructor - non parameterized

3. def __init__(self):

4. print("This is non parametrized constructor")

5. def show(self,name):

6. print("Hello",name)

7. student = Student()

8. student.show("John")

Python Parameterized Constructor


The parameterized constructor has multiple parameters along with
the self. Consider the following example.

Example
1. class Student:

2. # Constructor - parameterized

3. def __init__(self, name):

4. print("This is parametrized constructor")

5. self.name = name

6. def show(self):

7. print("Hello",self.name)

8. student = Student("John")

9. student.show()

Output:

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 108

Page 111
Data Science using Python – Unit II

This is parametrized constructor


Hello John

Python Default Constructor


When we do not include the constructor in the class or forget to
declare it, then that becomes the default constructor. It does not perform
any task but initializes the objects. Consider the following example.

Example
1. class Student:

2. roll_num = 101

3. name = "Joseph"

4.

5. def display(self):

6. print(self.roll_num,self.name)

7.

8. st = Student()

9. st.display()

Output:
101 Joseph

More than One Constructor in Single class


Let's have a look at another scenario, what happen if we declare the
two same constructors in the class.

Example
1. class Student:

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 109

Page 112
Data Science using Python – Unit II

2. def __init__(self):

3. print("The First Constructor")

4. def __init__(self):

5. print("The second contructor")

6.

7. st = Student()

Output:
The Second Constructor

In the above code, the object st called the second constructor


whereas both have the same configuration. The first method is not
accessible by the st object. Internally, the object of the class will always call
the last constructor if the class has multiple constructors.

Python built-in class functions


The built-in functions defined in the class are described in the following
table.

SN Function Description
1 getattr(obj,name,default) It is used to access the attribute of the object.
It is used to set a particular value to the specific attribute of
2 setattr(obj, name,value)
an object.
3 delattr(obj, name) It is used to delete a specific attribute.

4 hasattr(obj, name) It returns true if the object contains some specific attribute.

Example
1. class Student:

2. def __init__(self, name, id, age):

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 110

Page 113
Data Science using Python – Unit II

3. self.name = name

4. self.id = id

5. self.age = age

6. # creates the object of the class Student

7. s = Student("John", 101, 22)

8. # prints the attribute name of the object s

9. print(getattr(s, 'name'))

10. # reset the value of attribute age to 23

11. setattr(s, "age", 23)

12. # prints the modified value of age

13. print(getattr(s, 'age'))

14. # prints true if the student contains the attribute with name id

15. print(hasattr(s, 'id'))

16. # deletes the attribute age

17. delattr(s, 'age')

18. # this will give an error since the attribute age has been deleted

19. print(s.age)

Output:
John
23
True
AttributeError: 'Student' object has no attribute 'age'

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 111

Page 114
Data Science using Python – Unit II

Built-in class attributes


Along with the other attributes, a Python class also contains some
built-in class attributes which provide information about the class.

The built-in class attributes are given in the below table.

SN Attribute Description
It provides the dictionary containing the information about the class
1 __dict__
namespace.
2 __doc__ It contains a string which has the class documentation
3 __name__ It is used to access the class name.
4 __module__ It is used to access the module in which, this class is defined.
5 __bases__ It contains a tuple including all base classes.

Example
1. class Student:
2. def __init__(self,name,id,age):
3. self.name = name;
4. self.id = id;
5. self.age = age
6. def display_details(self):
7. print("Name:%s, ID:%d, age:%d"%(self.name,self.id))
8. s = Student("John",101,22)
9. print(s.__doc__)
10. print(s.__dict__)
11. print(s.__module__)

Output:
None
{'name': 'John', 'id': 101, 'age': 22}
__main__

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 112

Page 115
Data Science using Python – Unit II

2.8 Data Hiding in Python

What is Data Hiding?


Data hiding is a part of object-oriented programming, which is
generally used to hide the data information from the user. It includes
internal object details such as data members, internal working. It
maintained the data integrity and restricted access to the class member.
The main working of data hiding is that it combines the data and functions
into a single unit to conceal data within a class. We cannot directly access
the data from outside the class.

This process is also known as the data encapsulation. It is done by


hiding the working information to user. In the process, we declare class
members as private so that no other class can access these data
members. It is accessible only within the class.

Data Hiding in Python


Python is the most popular programming language as it applies in
every technical domain and has a straightforward syntax and vast libraries.
In the official Python documentation, Data hiding isolates the client from a
part of program implementation. Some of the essential members must be
hidden from the user. Programs or modules only reflected how we could
use them, but users cannot be familiar with how the application works.
Thus it provides security and avoiding dependency as well.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 113

Page 116
Data Science using Python – Unit II

We can perform data hiding in Python using the __ double


underscore before prefix. This makes the class members private and
inaccessible to the other classes.

Let's understand the following example.

Example -
1. class CounterClass:
2. __privateCount = 0
3. def count(self):
4. self.__privateCount += 1
5. print(self.__privateCount)
6. counter = CounterClass()
7. counter.count()
8. counter.count()
9. print(counter.__privateCount)

Output:
1
2
Traceback (most recent call last):
File "<string>", line 17, in <module>
AttributeError: 'CounterClass' object has no attribute
'__privateCount'

However we can access the private member using the class name.
1. print(counter.CounterClass__privatecounter)

Output:
1
2
2

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 114

Page 117
Data Science using Python – Unit II

Advantages of Data Hiding


Below are the main advantages of the data hiding.

o The class objects are disconnected from the irrelevant data.


o It enhances the security against hackers that are unable to access
important data.
o It isolates object as the basic concept of OOP.
o It helps programmer from incorrect linking to the corrupt data.
o We can isolate the object from the basic concept of OOP.
o It provides the high security which stops damage to violate data by
hiding it from the public.

Disadvantages of Data Hiding


Every coin has two sides if there are advantages then there will be
disadvantage as well. Here are the some disadvantages are given below.
o Sometimes programmers need to write the extra lien of the code.
o The data hiding prevents linkage that act as link between visible and
invisible data makes the object faster.
o It forces the programmers to write extra code to hide the important
data from the common users.

Conclusion
Data hiding is an important aspect when it comes to privacy and
security to particularly within the application. It plays an essential role in
preventing unauthorized access. It has some disadvantages, but these are
avoidable in front of its advantages.
Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 115

Page 118
Data Science using Python – Unit II

2.9 Abstraction in Python


Abstraction is used to hide the internal functionality of the function
from the users. The users only interact with the basic implementation of the
function, but inner working is hidden. User is familiar with that "what
function does" but they don't know "how it does."

In simple words, we all use the smartphone and very much familiar
with its functions such as camera, voice-recorder, call-dialing, etc., but we
don't know how these operations are happening in the background. Let's
take another example - When we use the TV remote to increase the
volume. We don't know how pressing a key increases the volume of the
TV. We only know to press the "+" button to increase the volume.

Why Abstraction is Important?


In Python, an abstraction is used to hide the irrelevant data/class in
order to reduce the complexity. It also enhances the application efficiency.
Next, we will learn how we can achieve abstraction using the Python
program.

Abstraction classes in Python


In Python, abstraction can be achieved by using abstract classes and
interfaces.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 116

Page 119
Data Science using Python – Unit II

A class that consists of one or more abstract method is called the


abstract class. Abstract methods do not contain their implementation.
Abstract class can be inherited by the subclass and abstract method gets
its definition in the subclass. Abstraction classes are meant to be the
blueprint of the other class. An abstract class can be useful when we are
designing large functions. An abstract class is also helpful to provide the
standard interface for different implementations of components. Python
provides the abc module to use the abstraction in the Python program.
Let's see the following syntax.

Syntax
1. from abc import ABC
2. class ClassName(ABC):

We import the ABC class from the abc module.

Abstract Base Classes


An abstract base class is the common application program of the
interface for a set of subclasses. It can be used by the third-party, which
will provide the implementations such as with plugins. It is also beneficial
when we work with the large code-base hard to remember all the classes.

Working of the Abstract Classes


Unlike the other high-level language, Python doesn't provide the
abstract class itself. We need to import the abc module, which provides the
base for defining Abstract Base classes (ABC). The ABC works by
decorating methods of the base class as abstract. It registers concrete
classes as the implementation of the abstract base. We use

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 117

Page 120
Data Science using Python – Unit II

the @abstractmethod decorator to define an abstract method or if we


don't provide the definition to the method, it automatically becomes the
abstract method. Let's understand the following example.

Example -
1. # Python program demonstrate
2. # abstract base class work
3. from abc import ABC, abstractmethod
4. class Car(ABC):
5. def mileage(self):
6. pass
7.
8. class Tesla(Car):
9. def mileage(self):
10. print("The mileage is 30kmph")
11. class Suzuki(Car):
12. def mileage(self):
13. print("The mileage is 25kmph ")
14. class Duster(Car):
15. def mileage(self):
16. print("The mileage is 24kmph ")
17.
18. class Renault(Car):
19. def mileage(self):
20. print("The mileage is 27kmph ")
21.
22. # Driver code
23. t= Tesla ()
24. t.mileage()
25.
26. r = Renault()
27. r.mileage()
28.
29. s = Suzuki()

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 118

Page 121
Data Science using Python – Unit II

30. s.mileage()
31. d = Duster()
32. d.mileage()
Output:
The mileage is 30kmph
The mileage is 27kmph
The mileage is 25kmph
The mileage is 24kmph
Explanation -

In the above code, we have imported the abc module to create the
abstract base class. We created the Car class that inherited the ABC class
and defined an abstract method named mileage(). We have then inherited
the base class from the three different subclasses and implemented the
abstract method differently. We created the objects to call the abstract
method.

Let's understand another example.

Let's understand another example.

Example -
1. # Python program to define
2. # abstract class
3.
4. from abc import ABC
5.
6. class Polygon(ABC):
7.
8. # abstract method
9. def sides(self):
10. pass
11.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 119

Page 122
Data Science using Python – Unit II

12. class Triangle(Polygon):


13.
14.
15. def sides(self):
16. print("Triangle has 3 sides")
17.
18. class Pentagon(Polygon):
19.
20.
21. def sides(self):
22. print("Pentagon has 5 sides")
23.
24. class Hexagon(Polygon):
25.
26. def sides(self):
27. print("Hexagon has 6 sides")
28.
29. class square(Polygon):
30.
31. def sides(self):
32. print("I have 4 sides")
33.
34. # Driver code
35. t = Triangle()
36. t.sides()
37.
38. s = square()
39. s.sides()
40.
41. p = Pentagon()
42. p.sides()
43.
44. k = Hexagon()
45. K.sides()

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 120

Page 123
Data Science using Python – Unit II

Output:
Triangle has 3 sides
Square has 4 sides
Pentagon has 5 sides
Hexagon has 6 sides

Explanation -

In the above code, we have defined the abstract base class named
Polygon and we also defined the abstract method. This base class
inherited by the various subclasses. We implemented the abstract method
in each subclass. We created the object of the subclasses and invoke
the sides() method. The hidden implementations for the sides() method
inside the each subclass comes into play. The abstract
method sides() method, defined in the abstract class, is never invoked.

Points to Remember
Below are the points which we should remember about the abstract base
class in Python.

o An Abstract class can contain the both method normal and abstract
method.

o An Abstract cannot be instantiated; we cannot create objects for the


abstract class.

Abstraction is essential to hide the core functionality from the users. We


have covered the all the basic concepts of Abstraction in Python.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 121

Page 124
Data SScience using Python – Unit II

2.10 Python Inheritance


Inheritance is an important aspect of the object-oriented
object oriented paradigm.
Inheritance provides code reusability to the program because we can use
an existing class to create a new class instead of creating it from scratch.

In inheritance, the child class acquires the properties and can access
all the data members and functions defined in the parent class. A child
class can also provide its specific imp
implementation
lementation to the functions of the
parent class. In this section of the tutorial, we will discuss inheritance in
detail.

In python, a derived class can inherit base class by just mentioning


the base in the bracket after the derived class name. Consider th
the following
syntax to inherit a base class into the derived class.

Syntax
1. class derived-class(base
class(base class):
2. <class-suite>

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 122

Page 125
Data Science using Python – Unit II

A class can inherit multiple classes by mentioning all of them inside


the bracket. Consider the following syntax.

Syntax
1. class derive-class(<base class 1>, <base class 2>, ..... <base class n>):
2. <class - suite>

Example 1
1. class Animal:
2. def speak(self):
3. print("Animal Speaking")
4. #child class Dog inherits the base class Animal
5. class Dog(Animal):
6. def bark(self):
7. print("dog barking")
8. d = Dog()
9. d.bark()
10. d.speak()

Output:
dog barking
Animal Speaking

Python Multi-Level inheritance


Multi-Level inheritance is possible in python like other object-oriented
languages. Multi-level inheritance is archived when a derived class inherits
another derived class. There is no limit on the number of levels up to which,
the multi-level inheritance is archived in python.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 123

Page 126
Data SScience using Python – Unit II

The syntax of multi-level


level inheritance is given below.

Syntax
11. class class1:
12. <class-suite>
13. class class2(class1):
14. <class suite>
15. class class3(class2):
16. <class suite>

17. .

18. .

Example
1. class Animal:
2. def speak(self):
3. print("Animal Speaking")
4. #The child class Dog inherits the base class Animal
5. class Dog(Animal):

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 124

Page 127
Data SScience using Python – Unit II

6. def bark(self):
7. print("dog barking")
8. #The child class Dogchild inherits another child class Dog
9. class DogChild(Dog):
10. def eat(self):
11. print("Eating bread...")
12. d = DogChild()
13. d.bark()
14. d.speak()
15. d.eat()

Output:
dog barking
Animal Speaking
Eating bread...

Python Multiple inheritance


Python provides us the flexibility to inherit multiple base classes in the
child class.

The syntax to perform multiple inheritance is given below.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 125

Page 128
Data Science using Python – Unit II

Syntax
1. class Base1:
2. <class-suite>
3.
4. class Base2:
5. <class-suite>
6. .
7. .
8. .
9. class BaseN:
10. <class-suite>
11.
12. class Derived(Base1, Base2, ...... BaseN):
13. <class-suite>

Example
1. class Calculation1:
2. def Summation(self,a,b):
3. return a+b;
4. class Calculation2:
5. def Multiplication(self,a,b):
6. return a*b;
7. class Derived(Calculation1,Calculation2):
8. def Divide(self,a,b):
9. return a/b;
10. d = Derived()
11. print(d.Summation(10,20))
12. print(d.Multiplication(10,20))
13. print(d.Divide(10,20))

Output:
30
200
0.5

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 126

Page 129
Data Science using Python – Unit II

The issubclass (sub,sup) method


The issubclass(sub, sup) method is used to check the relationships
between the specified classes. It returns true if the first class is the
subclass of the second class, and false otherwise.

Consider the following example.

Example
1. class Calculation1:
2. def Summation(self,a,b):
3. return a+b;
4. class Calculation2:
5. def Multiplication(self,a,b):
6. return a*b;
7. class Derived(Calculation1,Calculation2):
8. def Divide(self,a,b):
9. return a/b;
10. d = Derived()
11. print(issubclass(Derived,Calculation2))
12. print(issubclass(Calculation1,Calculation2))

Output:
True
False

The isinstance (obj, class) method


The isinstance() method is used to check the relationship between
the objects and classes. It returns true if the first parameter, i.e., obj is the
instance of the second parameter, i.e., class.

Consider the following example.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 127

Page 130
Data Science using Python – Unit II

Example
1. class Calculation1:
2. def Summation(self,a,b):
3. return a+b;
4. class Calculation2:
5. def Multiplication(self,a,b):
6. return a*b;
7. class Derived(Calculation1,Calculation2):
8. def Divide(self,a,b):
9. return a/b;
10. d = Derived()
11. print(isinstance(d,Derived))

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 128

Page 131
Data Science using Python – Unit III

UNIT – III

3.1 What is NumPy?


NumPy is a Python library used for working with arrays. It also has
functions for working in domain of linear algebra, fourier transform, and
matrices. NumPy was created in 2005 by Travis Oliphant. It is an open
source project and you can use it freely.
NumPy stands for Numerical Python.
NumPy is a general-purpose array-processing package. It provides a
high-performance multidimensional array object, and tools for working
with these arrays. It is the fundamental package for scientific computing
with Python. It is open-source software. It contains various features
including these important ones:
 A powerful N-dimensional array object
 Sophisticated (broadcasting) functions
 Tools for integrating C/C++ and Fortran code
 Useful linear algebra, Fourier transform, and random number
capabilities
Besides its obvious scientific uses, NumPy can also be used as an
efficient multi-dimensional container of generic data. Arbitrary data-
types can be defined using Numpy which allows NumPy to seamlessly
and speedily integrate with a wide variety of databases. Installation:
 Mac and Linux users can install NumPy via pip command:
pip install numpy

 Windows does not have any package manager analogous to that in


linux or mac. Please download the pre-built windows installer for
NumPy from here (according to your system configuration and
Python version). And then install the packages manually.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 129

Page 132
Data Science using Python – Unit III

NumPy: NumPy’s main object is the homogeneous multidimensional


array.
 It is a table of elements (usually numbers), all of the same type,
indexed by a tuple of positive integers.
 In NumPy dimensions are called axes. The number of axes is rank.
 NumPy’s array class is called ndarray. It is also known by the
alias array.
Example :
# Python program to demonstrate
# basic array characteristics
import numpy as np
# Creating array object
arr = np.array( [[ 1, 2, 3],[ 4, 2, 5]] )
# Printing type of arr object
print("Array is of type: ", type(arr))
# Printing array dimensions (axes)
print("No. of dimensions: ", arr.ndim)
# Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of array
print("Size of array: ", arr.size)
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)
Output :
Array is of type:
No. of dimensions: 2
Shape of array: (2, 3)
Size of array: 6
Array stores elements of type: int64

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 130

Page 133
Data Science using Python – Unit III

3.2 Array creation:


There are various ways to create arrays in NumPy.For example,
you can create an array from a regular Python list or tuple using
the array function. The type of the resulting array is deduced from
the type of the elements in the sequences. Often, the elements of an
array are originally unknown, but its size is known. Hence, NumPy
offers several functions to create arrays with initial placeholder
content. These minimize the necessity of growing arrays, an
expensive operation. For example: np.zeros, np.ones, np.full,
np.empty, etc.

To create sequences of numbers, NumPy provides a function


analogous to range that returns arrays instead of lists.
 arange: returns evenly spaced values within a given
interval. step size is specified.
 linspace: returns evenly spaced values within a given
interval. num no. of elements are returned.
 Reshaping array: We can use reshape method to reshape an array.
Consider an array with shape (a1, a2, a3, …, aN). We can reshape
and convert it into another array with shape (b1, b2, b3, …, bM). The
only required condition is: a1 x a2 x a3 … x aN = b1 x b2 x b3 … x
bM . (i.e original size of array remains unchanged.)
 Flatten array: We can use flatten method to get a copy of array
collapsed into one dimension. It accepts order argument. Default
value is ‘C’ (for row-major order). Use ‘F’ for column major order.
Note: Type of array can be explicitly defined while creating array.
# Python program to demonstrate
# array creation techniques

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 131

Page 134
Data Science using Python – Unit III

import numpy as np

# Creating array from list with type float


a = np.array([[1, 2, 4], [5, 8, 7]], dtype = 'float')
print ("Array created using passed list:\n", a)

# Creating array from tuple


b = np.array((1 , 3, 2))
print ("\nArray created using passed tuple:\n", b)

# Creating a 3X4 array with all zeros


c = np.zeros((3, 4))
print ("\nAn array initialized with all zeros:\n", c)

# Create a constant value array of complex type


d = np.full((3, 3), 6, dtype = 'complex')
print ("\nAn array initialized with all 6s." "Array type is complex:\n", d)

# Create an array with random values


e = np.random.random((2, 2))
print ("\nA random array:\n", e)

# Create a sequence of integers


# from 0 to 30 with steps of 5
f = np.arange(0, 30, 5)
print ("\nA sequential array with steps of 5:\n", f)

# Create a sequence of 10 values in range 0 to 5


g = np.linspace(0, 5, 10)

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 132

Page 135
Data Science using Python – Unit III

print ("\nA sequential array with 10 values between" "0 and 5:\n", g)

# Reshaping 3X4 array to 2X2X3 array


arr = np.array([[1, 2, 3, 4],[5, 2, 4, 2],[1, 2, 0, 1]])
newarr = arr.reshape(2, 2, 3)
print ("\nOriginal array:\n", arr)
print ("Reshaped array:\n", newarr)

# Flatten array
arr = np.array([[1, 2, 3], [4, 5, 6]])
flarr = arr.flatten()
print ("\nOriginal array:\n", arr)
print ("Fattened array:\n", flarr)

Output :
Array created using passed list:
[[ 1. 2. 4.]
[ 5. 8. 7.]]

Array created using passed tuple:


[1 3 2]

An array initialized with all zeros:


[[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]
[ 0. 0. 0. 0.]]

An array initialized with all 6s. Array type is complex:


[[ 6.+0.j 6.+0.j 6.+0.j]

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 133

Page 136
Data Science using Python – Unit III

[ 6.+0.j 6.+0.j 6.+0.j]


[ 6.+0.j 6.+0.j 6.+0.j]]

A random array:
[[ 0.46829566 0.67079389]
[ 0.09079849 0.95410464]]

A sequential array with steps of 5:


[ 0 5 10 15 20 25]

A sequential array with 10 values between 0 and 5:


[ 0.0.55555556 1.11111111 1.66666667 2.22222222 2.77777778
3.33333333 3.88888889 4.44444444 5.]

Original array:
[[1 2 3 4]
[5 2 4 2]
[1 2 0 1]]
Reshaped array:
[[[1 2 3]
[4 5 2]]

[[4 2 1]
[2 0 1]]]
Original array:
[[1 2 3]
[4 5 6]]
Fattened array:
[1 2 3 4 5 6]

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 134

Page 137
Data Science using Python – Unit III

3.3 Data Types in NumPhy Arrays


By default Python have these data types:
strings - used to represent text data, the text is given under quote
marks. e.g. "ABCD"
integer - used to represent integer numbers. e.g. -1, -2, -3
float - used to represent real numbers. e.g. 1.2, 42.42
boolean - used to represent True or False.
complex - used to represent complex numbers. e.g. 1.0 + 2.0j,
Data Types in NumPy NumPy has some extra data types, and refer
to data types with one character, like i for integers, u for unsigned
integers etc.
Below is a list of all data types in NumPy and the characters used to
represent them.
i - integer M - datetime
b - boolean O - object
u - unsigned integer S - string
f - float U - unicode string
c - complex float V - fixed chunk of memory for
m - timedelta other type ( void )
Checking the Data Type of an Array
The NumPy array object has a property called dtype that returns the
data type of the array:
Get the data type of an array object:
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr.dtype)
Get the data type of an array containing strings:
import numpy as np
arr = np.array(['apple', 'banana', 'cherry'])
print(arr.dtype)

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 135

Page 138
Data Science using Python – Unit III

3.4 Arithmetic with NumPy:


Plethora of built-in arithmetic functions are provided in NumPy.
 Operations on single array: We can use overloaded arithmetic
operators to do element-wise operation on array to create a new
array. In case of +=, -=, *= operators, the existing array is modified.
# Python program to demonstrate

# basic operations on single array

import numpy as np

a = np.array([1, 2, 5, 3])

# add 1 to every element

print ("Adding 1 to every element:", a+1)

# subtract 3 from each element

print ("Subtracting 3 from each element:", a-3)

# multiply each element by 10

print ("Multiplying each element by 10:", a*10)

# square each element

print ("Squaring each element:", a**2)

# modify existing array

a *= 2

print ("Doubled each element of original array:", a)

# transpose of array

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 136

Page 139
Data Science using Python – Unit III

a = np.array([[1, 2, 3], [3, 4, 5], [9, 6, 0]])

print ("\nOriginal array:\n", a)

print ("Transpose of array:\n", a.T)

Output :

Adding 1 to every element: [2 3 6 4]

Subtracting 3 from each element: [-2 -1 2 0]

Multiplying each element by 10: [10 20 50 30]

Squaring each element: [ 1 4 25 9]

Doubled each element of original array: [ 2 4 10 6]

Original array:

[[1 2 3]

[3 4 5]

[9 6 0]]

Transpose of array:

[[1 3 9]

[2 4 6]

[3 5 0]]

 Unary operators: Many unary operations are provided as a method


of ndarray class. This includes sum, min, max, etc. These functions
can also be applied row-wise or column-wise by setting an axis
parameter.
# Python program to demonstrate
# unary operators in numpy
import numpy as np

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 137

Page 140
Data Science using Python – Unit III

arr = np.array([[1, 5, 6],[4, 7, 2],[3, 1, 9]])


# maximum element of array
print ("Largest element is:", arr.max())
print ("Row-wise maximum elements:",arr.max(axis = 1))
# minimum element of array
print ("Column-wise minimum elements:",arr.min(axis = 0))
# sum of array elements
print ("Sum of all array elements:",arr.sum())
# cumulative sum along each row
print ("Cumulative sum along each row:\n",arr.cumsum(axis =
1))
Output :

Largest element is: 9

Row-wise maximum elements: [6 7 9]

Column-wise minimum elements: [1 1 2]

Sum of all array elements: 38

Cumulative sum along each row:

[[ 1 6 12]

[ 4 11 13]

[ 3 4 13]]

 Binary operators: These operations apply on array elementwise and


a new array is created. You can use all basic arithmetic operators like
+, -, /, , etc. In case of +=, -=, = operators, the existing array is
modified.
# Python program to demonstrate

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 138

Page 141
Data Science using Python – Unit III

# binary operators in Numpy

import numpy as np

a = np.array([[1, 2], [3, 4]])

b = np.array([[4, 3],[2, 1]])

# add arrays

print ("Array sum:\n", a + b)

# multiply arrays (elementwise multiplication)

print ("Array multiplication:\n", a*b)

# matrix multiplication

print ("Matrix multiplication:\n", a.dot(b))

Output:

Array sum:

[[5 5]

[5 5]]

Array multiplication:

[[4 6]

[6 4]]

Matrix multiplication:

[[ 8 5]

[20 13]]

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 139

Page 142
Data Science using Python – Unit III

3.5 Array Indexing:


Knowing the basics of array indexing is important for analyzing and
manipulating the array object. NumPy offers many ways to do array
indexing.

 Slicing: Just like lists in python, NumPy arrays can be sliced. As


arrays can be multidimensional, you need to specify a slice for each
dimension of the array.

Slicing in python means taking elements from one given index to


another given index.

We pass slice instead of index like this: [start:end].

We can also define the step, like this: [start:end:step].

If we don't pass start its considered 0

If we don't pass end its considered length of array in that dimension

If we don't pass step its considered 1

Example:

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[1:5])

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 140

Page 143
Data Science using Python – Unit III

Negative Slicing

Use the minus operator to refer to an index from the end:

import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[-3:-1])

Use the step value to determine the step of the slicing:

Example

Return every other element from index 1 to index 5:

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6, 7])

print(arr[1:5:2])

Slicing 2-D Arrays

Example

From the second element, slice elements from index 1 to index 4 (not
included):

import numpy as np

arr = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])

print(arr[1, 1:4])

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 141

Page 144
Data Science using Python – Unit III

 Integer array indexing:


In this method, lists are passed for indexing for each dimension. One
to one mapping of corresponding elements is done to construct a new
arbitrary array.
# Python program to demonstrate

# indexing in numpy

import numpy as np

# An exemplar array

arr = np.array([[-1, 2, 0, 4], [4, -0.5, 6, 0], [2.6, 0, 7, 8], [3,


-7, 4, 2.0]])

# Integer array indexing example

temp = arr[[0, 1, 2, 3], [3, 2, 1, 0]]

print ("\nElements at indices (0, 3), (1, 2), (2, 1)," "(3, 0):\n", temp)

 Boolean array indexing:


This method is used when we want to pick elements from array which
satisfy some condition.
# boolean array indexing example

cond = arr > 0 # cond is a boolean array

temp = arr[cond]

print ("\nElements greater than 0:\n", temp)

Output :

Array with first 2 rows and alternatecolumns(0 and 2):

[[-1. 0.]

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 142

Page 145
Data Science using Python – Unit III

[ 4. 6.]]

Elements at indices (0, 3), (1, 2), (2, 1),(3, 0):

[ 4. 6. 0. 3.]

Elements greater than 0:

[ 2. 4. 4. 6. 2.6 7. 8. 3. 4. 2. ]

 Swapping in arrays
Numpy allows you to swap axes without costing anything in
memory, and very little in time. The obvious axis swap is a 2D array
transpose:
>>> import numpy as np
>>> arr = np.arange(10).reshape((5, 2))
>>> arr
array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
>>> arr.T
array([[0, 2, 4, 6, 8],
[1, 3, 5, 7, 9]])
The transpose method - and the np.tranpose function does the same
thing as the .T attribute above:
>>> arr.transpose()
array([[0, 2, 4, 6, 8],
[1, 3, 5, 7, 9]])
The advantage of transpose over the .T attribute is that is allows you to
move axes into any arbitrary order.
For example, let’s say you had a 3D array:
>>> arr = np.arange(24).reshape((2, 3, 4))
>>> arr
array([[[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 143

Page 146
Data Science using Python – Unit III

[ 8, 9, 10, 11]],
<BLANKLINE>
[[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]]])
>>> arr.shape
(2, 3, 4)
>>> arr[:, :, 0]
array([[ 0, 4, 8],
[12, 16, 20]])

 Universal functions (ufunc):


NumPy provides familiar mathematical functions such as sin, cos, exp,
etc. These functions also operate elementwise on an array, producing
an array as output.
Note: All the operations we did above using overloaded operators can
be done using ufuncs like np.add, np.subtract, np.multiply, np.divide,
np.sum, etc.
# Python program to demonstrate
# universal functions in numpy
import numpy as np
# create an array of sine values
a = np.array([0, np.pi/2, np.pi])
print ("Sine values of array elements:", np.sin(a))
# exponential values
a = np.array([0, 1, 2, 3])
print ("Exponent of array elements:", np.exp(a))
# square root of array values
print ("Square root of array elements:", np.sqrt(a))
Output:
Sine values of array elements: [ 0.00000000e+00 1.00000000e+00
1.22464680e-16]
Exponent of array elements: [ 1. 2.71828183 7.3890561
20.08553692]
Square root of array elements: [ 0. 1. 1.41421356
1.73205081]

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 144

Page 147
Data Science using Python – Unit III

3.6 Sorting array:


There is a simple np.sort method for sorting NumPy arrays. Let’s
explore it a bit. Sorting means putting elements in an ordered
sequence.Ordered sequence is any sequence that has an order
corresponding to elements, like numeric or alphabetical, ascending or
descending.
The NumPy ndarray object has a function called sort(), that will sort a
specified array.

Sort the array:

import numpy as np

arr = np.array([3, 2, 0, 1])

print(np.sort(arr))

Note: This method returns a copy of the array, leaving the original array
unchanged.

You can also sort arrays of strings, or any other data type:

Example

Sort the array alphabetically:

import numpy as np

arr = np.array(['banana', 'cherry', 'apple'])

print(np.sort(arr))

Example

Sort a boolean array:

import numpy as np

arr = np.array([True, False, True])

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 145

Page 148
Data Science using Python – Unit III

print(np.sort(arr))

Sorting a 2-D Array

If you use the sort() method on a 2-D array, both arrays will be sorted:

Sort a 2-D array:

import numpy as np

arr = np.array([[3, 2, 4], [5, 0, 1]])

print(np.sort(arr))

Use the correct NumPy method to return a sorted array.

arr = np.array([3, 2, 0, 1])

x = np.

(arr)

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 146

Page 149
Data Science using Python – Unit IV

UNIT – IV
4.1 Introduction to pandas Data Structures:
Pandas deals with the following three data structures −
 Series
 DataFrame
 Panel
These data structures are built on top of Numpy array, which means they
are fast.

Dimension & Description


The best way to think of these data structures is that the higher
dimensional data structure is a container of its lower dimensional data
structure. For example, DataFrame is a container of Series, Panel is a
container of DataFrame.
Data Structure Dimensions Description
Series 1 1D labeled homogeneous array,
sizeimmutable.
Data Frames 2 General 2D labeled, size-mutable
tabular structure with potentially
heterogeneously typed columns.
Panel 3 General 3D labeled, size-mutable
array.

Building and handling two or more dimensional arrays is a tedious task,


burden is placed on the user to consider the orientation of the data set
when writing functions. But using Pandas data structures, the mental
effort of the user is reduced.
For example, with tabular data (DataFrame) it is more semantically
helpful to think of the index (the rows) and the columns rather than axis
0 and axis 1.
Mutability
All Pandas data structures are value mutable (can be changed) and
except Series all are size mutable. Series is size immutable.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 147

Page 150
Data Science using Python – Unit IV

Note − DataFrame is widely used and one of the most important data
structures. Panel is used much less.

Series
Series is a one-dimensional array like structure with homogeneous
data. For example, the following series is a collection of integers 10, 23,
56, …
10 23 56 17 52 61 73 90 26 72
Key Points
 Homogeneous data
 Size Immutable
 Values of Data Mutable

DataFrame
DataFrame is a two-dimensional array with heterogeneous data. For
example,
Name Age Gender Rating
Steve 32 Male 3.45
Lia 28 Female 4.6
Vin 45 Male 3.9
Katie 38 Female 2.78

The table represents the data of a sales team of an organization with


their overall performance rating. The data is represented in rows and
columns. Each column represents an attribute and each row represents
a person.
Data Type of Columns
The data types of the four columns are as follows −
Column Type
Name String
Age Integer
Gender String
Rating Float

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 148

Page 151
Data Science using Python – Unit IV

Key Points
 Heterogeneous data
 Size Mutable
 Data Mutable

Panel
Panel is a three-dimensional data structure with heterogeneous
data. It is hard to represent the panel in graphical representation. But a
panel can be illustrated as a container of DataFrame.
Key Points
 Heterogeneous data
 Size Mutable
 Data Mutable

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 149

Page 152
Data Science using Python – Unit IV

4.2 Series
Series is a one-dimensional labeled array capable of holding data
of any type (integer, string, float, python objects, etc.). The axis labels
are collectively called index.

pandas.Series
A pandas Series can be created using the following constructor −
pandas.Series( data, index, dtype, copy)
The parameters of the constructor are as follows −
Sr.No Parameter & Description
1 data
data takes various forms like ndarray, list, constants

2 index
Index values must be unique and hashable, same length as
data. Default np.arrange(n) if no index is passed.

3 dtype
dtype is for data type. If None, data type will be inferred

4 copy
Copy data. Default False

A series can be created using various inputs like −


 Array
 Dict
 Scalar value or constant

Create an Empty Series


A basic series, which can be created is an Empty Series.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 150

Page 153
Data Science using Python – Unit IV

Example
Live Demo
#import the pandas library and aliasing as pd
import pandas as pd
s = pd.Series()
print s
Its output is as follows −
Series([], dtype: float64)

Create a Series from ndarray


If data is an ndarray, then index passed must be of the same length. If
no index is passed, then by default index will be range(n) where n is
array length, i.e., [0,1,2,3…. range(len(array))-1].
Example 1
Live Demo
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np
data = np.array(['a','b','c','d'])
s = pd.Series(data)
print s
Its output is as follows −
0 a
1 b
2 c
3 d
dtype: object
We did not pass any index, so by default, it assigned the indexes
ranging from 0 to len(data)-1, i.e., 0 to 3.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 151

Page 154
Data Science using Python – Unit IV

4.3 DataFrame
A Data frame is a two-dimensional data structure, i.e., data is
aligned in a tabular fashion in rows and columns.
Features of DataFrame
 Potentially columns are of different types
 Size – Mutable
 Labeled axes (rows and columns)
 Can Perform Arithmetic operations on rows and columns
Structure
Let us assume that we are creating a data frame with student’s data.

You can think of it as an SQL table or a spreadsheet data


representation.

pandas.DataFrame
A pandas DataFrame can be created using the following constructor −
pandas.DataFrame( data, index, columns, dtype, copy)

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 152

Page 155
Data Science using Python – Unit IV

The parameters of the constructor are as follows −


Sr.No Parameter & Description
1 data
data takes various forms like ndarray, series, map, lists, dict,
constants and also another DataFrame.

2 index
For the row labels, the Index to be used for the resulting frame is
Optional Default np.arange(n) if no index is passed.

3 columns
For column labels, the optional default syntax is - np.arange(n).
This is only true if no index is passed.

4 dtype
Data type of each column.

5 copy
This command (or whatever it is) is used for copying of data, if
the default is False.

Create DataFrame
A pandas DataFrame can be created using various inputs like −
 Lists
 dict
 Series
 Numpy ndarrays
 Another DataFrame
In the subsequent sections of this chapter, we will see how to create a
DataFrame using these inputs.

Create an Empty DataFrame


A basic DataFrame, which can be created is an Empty Dataframe.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 153

Page 156
Data Science using Python – Unit IV

Example
Live Demo
#import the pandas library and aliasing as pd
import pandas as pd
df = pd.DataFrame()
print df
Its output is as follows −
Empty DataFrame
Columns: []
Index: []

Create a DataFrame from Lists


The DataFrame can be created using a single list or a list of lists.
Example 1
Live Demo
import pandas as pd
data = [1,2,3,4,5]
df = pd.DataFrame(data)
print df
Its output is as follows −
0
0 1
1 2
2 3
3 4
4 5

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 154

Page 157
Data Science using Python – Unit IV

4.4 PANEL
A panel is a 3D container of data. The term Panel data is derived
from econometrics and is partially responsible for the name pandas
− pan(el)-da(ta)-s.

The names for the 3 axes are intended to give some semantic meaning
to describing operations involving panel data. They are −

 items − axis 0, each item corresponds to a DataFrame contained


inside.
 major_axis − axis 1, it is the index (rows) of each of the
DataFrames.
 minor_axis − axis 2, it is the columns of each of the DataFrames.

pandas.Panel()

A Panel can be created using the following constructor −

pandas.Panel(data, items, major_axis, minor_axis, dtype, copy)

The parameters of the constructor are as follows −

Parameter Description
Data Data takes various forms like ndarray, series, map, lists,
dict, constants and also another DataFrame
Items axis=0
major_axis axis=1
minor_axis axis=2
Dtype Data type of each column
Copy Copy data. Default, false

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 155

Page 158
Data Science using Python – Unit IV

Create Panel
A Panel can be created using multiple ways like −
 From ndarrays
 From dict of DataFrames
From 3D ndarray
Live Demo
# creating an empty panel
import pandas as pd
import numpy as np

data = np.random.rand(2,4,5)
p = pd.Panel(data)
print p
Its output is as follows −
<class 'pandas.core.panel.Panel'>
Dimensions: 2 (items) x 4 (major_axis) x 5 (minor_axis)
Items axis: 0 to 1
Major_axis axis: 0 to 3
Minor_axis axis: 0 to 4

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 156

Page 159
Data Science using Python – Unit IV

4.5 Indexing Selection


The Python and NumPy indexing operators "[ ]" and attribute
operator "." provide quick and easy access to Pandas data structures
across a wide range of use cases. However, since the type of the data to
be accessed isn’t known in advance, directly using standard operators
has some optimization limits. For production code, we recommend that
you take advantage of the optimized pandas data access methods
explained in this chapter.
Pandas now supports three types of Multi-axes indexing; the three types
are mentioned in the following table −
Sr.No Indexing & Description
1 .loc()
Label based

2 .iloc()
Integer based

3 .ix()
Both Label and Integer based

.loc()
Pandas provide various methods to have purely label based indexing.
When slicing, the start bound is also included. Integers are valid labels,
but they refer to the label and not the position.
.loc() has multiple access methods like −
 A single scalar label
 A list of labels
 A slice object
 A Boolean array
loc takes two single/list/range operator separated by ','. The first one
indicates the row and the second one indicates columns.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 157

Page 160
Data Science using Python – Unit IV

Example 1
Live Demo
#import the pandas library and aliasing as pd
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4),
index = ['a','b','c','d','e','f','g','h'], columns = ['A', 'B', 'C', 'D'])

#select all rows for a specific column


print df.loc[:,'A']
Its output is as follows −
a 0.391548
b -0.070649
c -0.317212
d -2.162406
e 2.202797
f 0.613709
g 1.050559
h 1.122680
Name: A, dtype: float64
Example 2
Live Demo
# import the pandas library and aliasing as pd
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4),
index = ['a','b','c','d','e','f','g','h'], columns = ['A', 'B', 'C', 'D'])

# Select all rows for multiple columns, say list[]


print df.loc[:,['A','C']]
Its output is as follows −
A C
a 0.391548 0.745623
b -0.070649 1.620406
c -0.317212 1.448365
d -2.162406 -0.873557
e 2.202797 0.528067
f 0.613709 0.286414
g 1.050559 0.216526
h 1.122680 -1.621420

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 158

Page 161
Data Science using Python – Unit IV

Example 3
Live Demo
# import the pandas library and aliasing as pd
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4),
index = ['a','b','c','d','e','f','g','h'], columns = ['A', 'B', 'C', 'D'])

# Select few rows for multiple columns, say list[]


print df.loc[['a','b','f','h'],['A','C']]
Its output is as follows −
A C
a 0.391548 0.745623
b -0.070649 1.620406
f 0.613709 0.286414
h 1.122680 -1.621420
Example 4
Live Demo
# import the pandas library and aliasing as pd
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4),
index = ['a','b','c','d','e','f','g','h'], columns = ['A', 'B', 'C', 'D'])

# Select range of rows for all columns


print df.loc['a':'h']
Its output is as follows −
A B C D
a 0.391548 -0.224297 0.745623 0.054301
b -0.070649 -0.880130 1.620406 1.419743
c -0.317212 -1.929698 1.448365 0.616899
d -2.162406 0.614256 -0.873557 1.093958
e 2.202797 -2.315915 0.528067 0.612482
f 0.613709 -0.157674 0.286414 -0.500517
g 1.050559 -2.272099 0.216526 0.928449
h 1.122680 0.324368 -1.621420 -0.741470
Example 5
Live Demo
# import the pandas library and aliasing as pd
import pandas as pd
Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 159

Page 162
Data Science using Python – Unit IV

import numpy as np

df = pd.DataFrame(np.random.randn(8, 4),
index = ['a','b','c','d','e','f','g','h'], columns = ['A', 'B', 'C', 'D'])

# for getting values with a boolean array


print df.loc['a']>0
Its output is as follows −
A False
B True
C False
D False
Name: a, dtype: bool

.iloc()
Pandas provide various methods in order to get purely integer based
indexing. Like python and numpy, these are 0-based indexing.
The various access methods are as follows −
 An Integer
 A list of integers
 A range of values
Example 1
Live Demo
# import the pandas library and aliasing as pd
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])

# select all rows for a specific column


print df.iloc[:4]
Its output is as follows −
A B C D
0 0.699435 0.256239 -1.270702 -0.645195
1 -0.685354 0.890791 -0.813012 0.631615
2 -0.783192 -0.531378 0.025070 0.230806
3 0.539042 -1.284314 0.826977 -0.026251

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 160

Page 163
Data Science using Python – Unit IV

Example 2
Live Demo
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])

# Integer slicing
print df.iloc[:4]
print df.iloc[1:5, 2:4]
Its output is as follows −
A B C D
0 0.699435 0.256239 -1.270702 -0.645195
1 -0.685354 0.890791 -0.813012 0.631615
2 -0.783192 -0.531378 0.025070 0.230806
3 0.539042 -1.284314 0.826977 -0.026251

C D
1 -0.813012 0.631615
2 0.025070 0.230806
3 0.826977 -0.026251
4 1.423332 1.130568
Example 3
Live Demo
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])

# Slicing through list of values


print df.iloc[[1, 3, 5], [1, 3]]
print df.iloc[1:3, :]
print df.iloc[:,1:3]
Its output is as follows −
B D
1 0.890791 0.631615
3 -1.284314 -0.026251
5 -0.512888 -0.518930

A B C D
1 -0.685354 0.890791 -0.813012 0.631615
2 -0.783192 -0.531378 0.025070 0.230806

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 161

Page 164
Data Science using Python – Unit IV

B C
0 0.256239 -1.270702
1 0.890791 -0.813012
2 -0.531378 0.025070
3 -1.284314 0.826977
4 -0.460729 1.423332
5 -0.512888 0.581409
6 -1.204853 0.098060
7 -0.947857 0.641358

.ix()
Besides pure label based and integer based, Pandas provides a hybrid
method for selections and subsetting the object using the .ix() operator.
Example 1
Live Demo
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])

# Integer slicing
print df.ix[:4]
Its output is as follows −
A B C D
0 0.699435 0.256239 -1.270702 -0.645195
1 -0.685354 0.890791 -0.813012 0.631615
2 -0.783192 -0.531378 0.025070 0.230806
3 0.539042 -1.284314 0.826977 -0.026251
Example 2
Live Demo
import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(8, 4), columns = ['A', 'B', 'C', 'D'])


# Index slicing
print df.ix[:,'A']
Its output is as follows −
0 0.699435
1 -0.685354

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 162

Page 165
Data Science using Python – Unit IV

2 -0.783192
3 0.539042
4 -1.044209
5 -1.415411
6 1.062095
7 0.994204
Name: A, dtype: float64

Use of Notations
Getting values from the Pandas object with Multi-axes indexing uses the
following notation −
Object Indexers Return Type
Series s.loc[indexer] Scalar value
DataFrame df.loc[row_index,col_index] Series object
Panel p.loc[item_index,major_index, p.loc[item_index,major_index,
minor_index] minor_index]

Note − .iloc() & .ix() applies the same indexing options and Return
value.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 163

Page 166
Data Science using Python – Unit IV

4.6 Filtration
Filtration filters the data on a defined criteria and returns the
subset of data. The filter() function is used to filter the data.

Live Demo
import pandas as pd
import numpy as np

ipl_data = {'Team': ['Riders', 'Riders', 'Devils', 'Devils', 'Kings',


'kings', 'Kings', 'Kings', 'Riders', 'Royals', 'Royals', 'Riders'],
'Rank': [1, 2, 2, 3, 3,4 ,1 ,1,2 , 4,1,2],
'Year': [2014,2015,2014,2015,2014,2015,2016,2017,2016,2014,2015,2017],
'Points':[876,789,863,673,741,812,756,788,694,701,804,690]}
df = pd.DataFrame(ipl_data)

print df.groupby('Team').filter(lambda x: len(x) >= 3)

Its output is as follows −

Points Rank Team Year


0 876 1 Riders 2014
1 789 2 Riders 2015
4 741 3 Kings 2014
6 756 1 Kings 2016
7 788 1 Kings 2017
8 694 2 Riders 2016
11 690 2 Riders 2017

In the above filter condition, we are asking to return the teams which
have participated three or more times in IPL.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 164

Page 167
Data Science using Python – Unit IV

4.7 Mapping,
map() function returns a map object(which is an iterator) of the results after
applying the given function to each item of a given iterable (list, tuple etc.)

Syntax :

map(fun, iter)
Parameters :

fun : It is a function to which map passes each element of given iterable.


iter : It is a iterable which is to be mapped.

NOTE : You can pass one or more iterable to the map() function.

Returns :

Returns a list of the results after applying the given function


to each item of a given iterable (list, tuple etc.)

NOTE : The returned value from map() (map object) then can be passed to
functions like list() (to create a list), set() (to create a set) .

CODE 1

# Python program to demonstrate working


# of map.

# Return double of n
def addition(n):
return n + n

# We double all numbers using map()


numbers = (1, 2, 3, 4)
result = map(addition, numbers)
print(list(result))
Output :

[2, 4, 6, 8]

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 165

Page 168
Data Science using Python – Unit IV

4.8 Sorting

The easiest way to sort is with the sorted(list) function, which takes a list
and returns a new list with those elements in sorted order. The original
list is not changed.

a = [5, 1, 4, 3]
print(sorted(a)) ## [1, 3, 4, 5]
print(a) ## [5, 1, 4, 3]

It's most common to pass a list into the sorted() function, but in fact it
can take as input any sort of iterable collection. The older list.sort()
method is an alternative detailed below. The sorted() function seems
easier to use compared to sort(), so I recommend using sorted().

The sorted() function can be customized through optional arguments.


The sorted() optional argument reverse=True, e.g. sorted(list,
reverse=True), makes it sort backwards.

strs = ['aa', 'BB', 'zz', 'CC']


print(sorted(strs)) ## ['BB', 'CC', 'aa', 'zz'] (case sensitive)
print(sorted(strs, reverse=True)) ## ['zz', 'aa', 'CC', 'BB']

Custom Sorting With key=

For more complex custom sorting, sorted() takes an optional "key="


specifying a "key" function that transforms each element before
comparison. The key function takes in 1 value and returns 1 value, and
the returned "proxy" value is used for the comparisons within the sort.

For example with a list of strings, specifying key=len (the built in len()
function) sorts the strings by length, from shortest to longest. The sort
calls len() for each string to get the list of proxy length values, and then
sorts with those proxy values.

strs = ['ccc', 'aaaa', 'd', 'bb']


print(sorted(strs, key=len)) ## ['d', 'bb', 'ccc', 'aaaa']

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 166

Page 169
Data Science using Python – Unit IV

As another example, specifying "str.lower" as the key function is a way


to force the sorting to treat uppercase and lowercase the same:

## "key" argument specifying str.lower function to use for sorting


print(sorted(strs, key=str.lower)) ## ['aa', 'BB', 'CC', 'zz']

You can also pass in your own MyFn as the key function, like this:

## Say we have a list of strings we want to sort by the last letter of the string.
strs = ['xc', 'zb', 'yd' ,'wa']

## Write a little function that takes a string, and returns its last letter.
## This will be the key function (takes in 1 value, returns 1 value).
def MyFn(s):
return s[-1]

## Now pass key=MyFn to sorted() to sort by the last letter:


print(sorted(strs, key=MyFn)) ## ['wa', 'zb', 'xc', 'yd']

For more complex sorting like sorting by last name then by first name,
you can use the itemgetter or attrgetter functions like:

from operator import itemgetter

# (first name, last name, score) tuples


grade = [('Freddy', 'Frank', 3), ('Anil', 'Frank', 100), ('Anil', 'Wang', 24)]

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 167

Page 170
Data Science using Python – Unit IV

sorted(grade, key=itemgetter(1,0))
# [('Anil', 'Frank', 100), ('Freddy', 'Frank', 3), ('Anil', 'Wang', 24)]

sorted(grade, key=itemgetter(0,-1)) # Aha! -1 sorts by last name in reverse


order.
#[('Anil', 'Wang', 24), ('Anil', 'Frank', 100), ('Freddy', 'Frank', 3)]

sort() method
As an alternative to sorted(), the sort() method on a list sorts that list into
ascending order, e.g. list.sort(). The sort() method changes the
underlying list and returns None, so use it like this:

alist.sort() ## correct
alist = blist.sort() ## Incorrect. sort() returns None

The above is a very common misunderstanding with sort() -- it


*does not return* the sorted list. The sort() method must be called on a
list; it does not work on any enumerable collection (but the sorted()
function above works on anything). The sort() method predates the
sorted() function, so you will likely see it in older code. The sort() method
does not need to create a new list, so it can be a little faster in the case
that the elements to sort

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 168

Page 171
Data Science using Python – Unit IV

4.9. Data Ranking


Data Ranking produces ranking for each element in the array of
elements. In case of ties, assigns the mean rank.

Live Demo
import pandas as pd
import numpy as np

s = pd.Series(np.random.np.random.randn(5), index=list('abcde'))
s['d'] = s['b'] # so there's a tie
print s.rank()

Its output is as follows −

a 1.0
b 3.5
c 2.0
d 3.5
e 5.0
dtype: float64

Rank optionally takes a parameter ascending which by default is true;


when false, data is reverse-ranked, with larger values assigned a smaller
rank.

Rank supports different tie-breaking methods, specified with the method


parameter −

 average − average rank of tied group


 min − lowest rank in the group
 max − highest rank in the group
 first − ranks assigned in the order they appear in the array

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 169

Page 172
Data Science using Python – Unit IV

4.10 Reading and Writing Data in Text Format


Like other languages, Python provides some inbuilt functions for
reading, writing, or accessing files. Python can handle mainly two types
of files. The normal text file and the binary files.
For the text files, each lines are terminated with a special
character '\n' (It is known as EOL or End Of Line). For the Binary file,
there is no line ending character. It saves the data after converting the
content into bit stream.
In this section we will discuss about the text files.

File Accessing Modes


Sr.No Modes & Description

1 r
It is Read Only mode. It opens the text file for reading. When the file is not
present, it raises I/O Error.

2 r+
This mode for Reading and Writing. When the file is not present, it will raise
I/O Error.

3 w
It is for write only jobs. When file is not present, it will create a file first, then
start writing, when the file is present, it will remove the contents of that file, and
start writing from beginning.

4 w+
It is Write and Read mode. When file is not present, it can create the file, or
when the file is present, the data will be overwritten.

5 a
This is append mode. So it writes data at the end of a file.

6 a+
Append and Read mode. It can append data as well as read the data.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 170

Page 173
Data Science using Python – Unit IV

Now see how a file can be written using writelines() and write() method.
Live Demo

#Create an empty file and write some lines


line1 = 'This is first line. \n'
lines = ['This is another line to store into file.\n',
'The Third Line for the file.\n',
'Another line... !@#$%^&*()_+.\n',
'End Line']
#open the file as write mode
my_file = open('file_read_write.txt', 'w')
my_file.write(line1)
my_file.writelines(lines) #Write multiple lines
my_file.close()
print('Writing Complete')

Output
Writing Complete
After writing the lines, we are appending some lines into the file.
Live Demo

#program to append some lines


line1 = '\n\nThis is a new line. This line will be appended. \n'
#open the file as append mode
my_file = open('file_read_write.txt', 'a')
my_file.write(line1)
my_file.close()
print('Appending Done')

Output
Appending Done
At last, we will see how to read the file content from the read() and
readline() method. We can provide some integer number 'n' to get first 'n'
characters.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 171

Page 174
Data Science using Python – Unit IV

#program to read from file


#open the file as read mode
my_file = open('file_read_write.txt', 'r')
print('Show the full content:')
print(my_file.read())
#Show first two lines
my_file.seek(0)
print('First two lines:')
print(my_file.readline(), end = '')
print(my_file.readline(), end = '')
#Show upto 25 characters
my_file.seek(0)
print('\n\nFirst 25 characters:')
print(my_file.read(25), end = '')
my_file.close()

Output
Show the full content:
This is first line.
This is another line to store into file.
The Third Line for the file.
Another line... !@#$%^&*()_+.
End Line

This is a new line. This line will be appended.

First two lines:


This is first line.
This is another line to store into file.

First 25 characters:
This is first line.
This

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 172

Page 175
Data Science using Python – Unit V

UNIT - V
5.1 Data Cleaning and Preparation
Data cleaning is one of the important parts of machine learning. It plays
a significant part in building a model. It surely isn’t the fanciest part of machine
learning and at the same time, there aren’t any hidden tricks or secrets to
uncover. However, the success or failure of a project relies on proper data
cleaning. Professional data scientists usually invest a very large portion of their
time in this step because of the belief that “Better data beats fancier
algorithms”.

If we have a well-cleaned dataset, there are chances that we can get


achieve good results with simple algorithms also, which can prove very
beneficial at times especially in terms of computation when the dataset size is
large.

Obviously, different types of data will require different types of cleaning.


However, this systematic approach can always serve as a good starting point.

Steps involved in Data Cleaning:

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 173

Page 176
Data Science using Python – Unit V

 Removal of unwanted observations


This includes deleting duplicate/ redundant or irrelevant values from your
dataset. Duplicate observations most frequently arise during data collection
and Irrelevant observations are those that don’t actually fit the specific
problem that you’re trying to solve.

Redundant observations alter the efficiency by a great extent as the data


repeats and may add towards the correct side or towards the incorrect side,
thereby producing unfaithful results.

Irrelevant observations are any type of data that is of no use to us and can be
removed directly.

 Fixing Structural errors


The errors that arise during measurement, transfer of data, or other similar
situations are called structural errors. Structural errors include typos in the
name of features, the same attribute with a different name, mislabeled
classes, i.e. separate classes that should really be the same, or inconsistent
capitalization.

For example, the model will treat America and America as different classes
or values, though they represent the same value or red, yellow, and red-yellow
as different classes or attributes, though one class can be included in the other
two classes. So, these are some structural errors that make our model
inefficient and give poor quality results.

 Managing Unwanted outliers


Outliers can cause problems with certain types of models. For example, linear
regression models are less robust to outliers than decision tree models.
Generally, we should not remove outliers until we have a legitimate reason to
remove them. Sometimes, removing them improves performance, sometimes
not. So, one must have a good reason to remove the outlier, such as suspicious
measurements that are unlikely to be part of real data.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 174

Page 177
Data Science using Python – Unit V

 Handling missing data


Missing data is a deceptively tricky issue in machine learning. We cannot
just ignore or remove the missing observation. They must be handled carefully
as they can be an indication of something important. The two most common
ways to deal with missing data are:

 Dropping observations with missing values.


The fact that the value was missing may be informative in itself.

Plus, in the real world, you often need to make predictions on new data even if
some of the features are missing!

 Imputing the missing values from past observations.


Again, “missingness” is almost always informative in itself, and you should tell
your algorithm if a value was missing.

Even if you build a model to impute your values, you’re not adding any real
information. You’re just reinforcing the patterns already provided by other
features.

Missing data is like missing a puzzle piece. If you drop it, that’s like pretending
the puzzle slot isn’t there. If you impute it, that’s like trying to squeeze in a
piece from somewhere else in the puzzle.

So, missing data is always an informative and an indication of something


important. And we must be aware of our algorithm of missing data by flagging
it. By using this technique of flagging and filling, you are essentially allowing
the algorithm to estimate the optimal constant for missingness, instead of just
filling it in with the mean.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 175

Page 178
Data Science using Python – Unit V

5.2 What is Data Transformation?


Data transformation is the process of converting, cleansing,
and structuring data into a usable format that can be analyzed
to support decision making processes, and to propel the growth
of an organization.

Data transformation is used when data needs to be


converted to match that of the destination system. This can
occur at two places of the data pipeline. First, organizations
with on-site data storage use an extract, transform, load, with
the data transformation taking place during the middle
‘transform’ step.

Organizations today mostly use cloud-based data


warehouses because they can scale their computing and
storage resources in seconds. Cloud based organizations, with
this huge scalability available, can skip the ETL process.
Instead, they use a transformation process that converts the
data as the raw data is uploaded, a process called extract,

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 176

Page 179
Data Science using Python – Unit V

load, and transform. The process of data transformation can be


handled manually, automated or a combination of both.

Transformation is an essential step in many processes, such


as data integration, migration, warehousing and wrangling. The
process of data transformation can be:

 Constructive, where data is added, copied or replicated


 Destructive, where records and fields are deleted
 Aesthetic, where certain values are standardized, or
 Structural, which includes columns being renamed, moved
and combined

On a basic level, the data transformation process converts


raw data into a usable format by removing duplicates,
converting data types and enriching the dataset. This data
transformation process involves defining the structure,
mapping the data, extracting the data from the source system,
performing the transformations, and then storing the
transformed data in the appropriate dataset. Data then
becomes accessible, secure and more usable, allowing for use
in a multitude of ways. Organizations perform data
transformation to ensure the compatibility of data with other
types while combining it with other information or migrating it
into a dataset. Through data transformations, organizations
can gain valuable insights into the operational and
informational functions.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 177

Page 180
Data Science using Python – Unit V

How is Data Transformation Used?


Data transformation works on the simple objective of
extracting data from a source, converting it into a usable
format and then delivering the converted data to the
destination system. The extraction phase involves data being
pulled into a central repository from different sources or
locations, therefore it is usually in its raw original form which is
not usable. To ensure the usability of the extracted data it
must be transformed into the desired format by taking it
through a number of steps. In certain cases, the data also
needs to be cleaned before the transformation takes place. This
step resolves the issues of missing values and inconsistencies
that exist in the dataset. The data transformation process is
carried out in five stages.

1. Discovery
The first step is to identify and understand data in its
original source format with the help of data profiling tools.
Finding all the sources and data types that need to be
transformed. This step helps in understanding how the data
needs to be transformed to fit into the desired format.

2. Mapping
The transformation is planned during the data mapping
phase. This includes determining the current structure, and the
consequent transformation that is required, then mapping the
data to understand at a basic level, the way individual fields
would be modified, joined or aggregated.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 178

Page 181
Data Science using Python – Unit V

3. Code Generation
The code, which is required to run the transformation
process, is created in this step using a data transformation
platform or tool.

4. Execution
The data is finally converted into the selected format with
the help of the code. The data is extracted from the source(s),
which can vary from structured to streaming, telemetry to log
files. Next, transformations are carried out on data, such as
aggregation, format conversion or merging, as planned in the
mapping stage. The transformed data is then sent to the
destination system which could be a dataset or a data
warehouse. Some of the transformation types, depending on
the data involved, include:

 Filtering which helps in selecting certain columns that


require transformation
 Enriching which fills out the basic gaps in the data set
 Splitting where a single column is split into multiple or
vice versa
 Removal of duplicate data, and
 Joining data from different sources

5. Review

The transformed data is evaluated to ensure the


conversion has had the desired results in terms of the format
of the data.
Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 179

Page 182
Data Science using Python – Unit V

5.3 String manipulation.


Like many other popular programming languages, strings in
Python are arrays of bytes representing unicode characters.However,
Python does not have a character data type, a single character is simply
a string with a length of 1.
Square brackets can be used to access elements of the string.
Packages that must be imported for supported are as follows
import pandas as pd
import altair as alt
import numpy as np

String basics

You can create strings with either single quotes or double


quotes. You can read the pandas user guide on working with text
data for more details.

string1 = "This is a string"


string2 = 'If I want to include a "quote" inside a string, I use single
quotes'
To include a literal single or double quote in a string you can use \ to
“escape” it:
double_quote = "\"" # or '"'
single_quote = '\'' # or "'"
That means if you want to include a literal backslash, you’ll need to
double it up: "\\".
Beware that the printed representation of a string is not the same as
string itself, because the printed representation shows the escapes:

x = "\" \\"

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 180

Page 183
Data Science using Python – Unit V

x
#> '" \\'
print(x)
#> " \

There are a handful of other special characters. The most common


are "\n", newline, and "\t", tab, but you can see the complete list in
the Python reference manual. You’ll also sometimes see strings
like "\u00b5", this is a way of writing non-English characters that works
on all platforms:
x = "\u00b5"
x
#> 'µ'
Multiple strings are often stored in a object series, which you can
create with []:
pd.Series(["one", "two", "three"])
#> 0 one
#> 1 two
#> 2 three
#> dtype: object

String length

Python contains many functions to work with strings. We’ll use the
functions from pandas for use on series. These all start with str. For
example, str.length() tells you the number of characters in a string:
pd.Series(["a", "R for data science", np.nan]).str.len()
#> 0 1.0
#> 1 18.0
#> 2 NaN
Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 181

Page 184
Data Science using Python – Unit V

#> dtype: float64

Combining strings
To combine two or more strings, use str_c():
pd.Series(["x", "y"]).str.cat()
#> 'xy'
pd.Series(["x", "y", "z"]).str.cat()
#> 'xyz'
Use the sep argument to control how they’re separated:
pd.Series(["x", "y"]).str.cat(sep = '_')
#> 'x_y'
Like most other functions in Python, missing values are contagious. If
you want them to print as "NA", use fillna() or na_rep = 'NA':
x = pd.Series(["abc", np.nan])
x.str.cat()
#> 'abc'
x.str.cat(na_rep = "NA")
#> 'abcNA'
x.fillna('NA').str.cat()
#> 'abcNA'

Subsetting strings
You can extract parts of a string using str[]. As well as the
string, str[] takes start:end arguments which give the (inclusive)
position of the substring:
x = pd.Series(["Apple", "Banana", "Pear"])
x.str[0:3]
# negative numbers count backwards from end
#> 0 App
Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 182

Page 185
Data Science using Python – Unit V

#> 1 Ban
#> 2 Pea
#> dtype: object
x.str[-3:]
#> 0 ple
#> 1 ana
#> 2 ear
#> dtype: object
Note that str[] won’t fail if the string is too short: it will just return as
much as possible:
pd.Series(["a"]).str[0:5]
#> 0 a
#> dtype: object
You can also use the assign strings using str.slice_replace() to modify
strings:
x.str.slice_replace(0,0, repl = "5")
#> 0 5Apple
#> 1 5Banana
#> 2 5Pear
#> dtype: object

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 183

Page 186
Data Science using Python – Unit V

5.4 Vectorized String Operations


One strength of Python is its relative ease in handling and
manipulating string data. Pandas builds on this and provides a
comprehensive set of vectorized string operations that are an important
part of the type of munging required when working with (read: cleaning
up) real-world data. In this chapter, weâ ll walk through some of the
Pandas string operations, and then take a look at using them to partially
clean up a very messy dataset of recipes collected from the internet.
Introducing Pandas String Operations
We saw in previous chapters how tools like NumPy and Pandas
generalize arithmetic operations so that we can easily and quickly
perform the same operation on many array elements. For example:

In [1]: import numpy as np


x = np.array([2, 3, 5, 7, 11, 13])
x*2
Out[1]: array([ 4, 6, 10, 14, 22, 26])

This vectorization of operations simplifies the syntax of operating on


arrays of data: we no longer have to worry about the size or shape of the
array, but just about what operation we want done. For arrays of strings,
NumPy does not provide such simple access, and thus youâ re stuck
using a more verbose loop syntax:

In [2]: data = ['peter', 'Paul', 'MARY', 'gUIDO']


[s.capitalize() for s in data]
Out[2]: ['Peter', 'Paul', 'Mary', 'Guido']

This is perhaps sufficient to work with some data, but it will break if there
are any missing values, so this approach requires putting in extra
checks:

In [3]: data = ['peter', 'Paul', None, 'MARY' ...

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 184

Page 187
Data Science using Python – Unit V

5.5 Plot with Pandas


Python’s popular data analysis library, pandas, provides several
different options for visualizing your data with .plot(). Even if you’re at the
beginning of your pandas journey, you’ll soon be creating basic plots that
will yield valuable insights into your data.

Basic Plotting: plot


This functionality on Series and DataFrame is just a simple wrapper
around the matplotlib libraries plot() method.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(10,4),index=pd.date_range('1/1/2000',
periods=10), columns=list('ABCD'))

df.plot()

Its output is as follows −If the index consists of dates, it


calls gct().autofmt_xdate() to format the x-axis as shown in the above illustration.

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 185

Page 188
Data Science using Python – Unit V

We can plot
one column
versus another
using
the x and y ke
ywords.

Plotting
methods allow
a handful of plot styles other than the default line plot. These methods
can be provided as the kind keyword argument to plot(). These include

 bar or barh for bar plots


 hist for histogram
 box for boxplot
 'area' for area plots
 'scatter' for scatter plots

Bar Plot
Let us now see what a Bar Plot is by creating one. A bar plot can be
created in the following way −

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.rand(10,4),columns=['a','b','c','d')
df.plot.bar()

Its output is as follows −

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 186

Page 189
Data Science using Python – Unit V

To produce a stacked bar plot, pass stacked=True −

import pandas as pd
df = pd.DataFrame(np.random.rand(10,4),columns=['a','b','c','d')
df.plot.bar(stacked=True)

Its output is as follows −

To get horizontal bar plots, use the barh method −

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10,4),columns=['a','b','c','d')
df.plot.barh(stacked=True)

Its output is as follows −

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 187

Page 190
Data Science using Python – Unit V

Histograms
Histograms can be plotted using the plot.hist() method. We can specify
number of bins.

import pandas as pd
import numpy as np
df =
pd.DataFrame({'a':np.random.randn(1000)+1,'b':np.random.randn(1000),'c':
np.random.randn(1000) - 1}, columns=['a', 'b', 'c'])
df.plot.hist(bins=20)
Its output is as follows −

To plot different histograms for each column, use the following code −

import pandas as pd
import numpy as np

df=pd.DataFrame({'a':np.random.randn(1000)+1,'b':np.random.randn(1000),'c':
np.random.randn(1000) - 1}, columns=['a', 'b', 'c'])

df.diff.hist(bins=20)

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 188

Page 191
Data Science using Python – Unit V

Its output is as follows −

Box Plots
Boxplot can be drawn calling Series.box.plot() and
DataFrame.box.plot(), or DataFrame.boxplot() to visualize the
distribution of values within each column. For instance, here is a boxplot
representing five trials of 10 observations of a uniform random variable
on [0,1).

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10, 5), columns=['A', 'B', 'C', 'D', 'E'])
df.plot.box()
Its output is as follows −

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 189

Page 192
Data Science using Python – Unit V

Area Plot
Area plot can be created using the Series.plot.area() or
the DataFrame.plot.area() methods.

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(10, 4), columns=['a', 'b', 'c', 'd'])
df.plot.area()

Its output is as follows −

Scatter Plot
Scatter plot can be created using
the DataFrame.plot.scatter() methods.

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(50, 4), columns=['a', 'b', 'c', 'd'])
df.plot.scatter(x='a', y='b')

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 190

Page 193
Data Science using Python – Unit V

Its output is as follows −

Pie Chart
Pie chart can be created using the DataFrame.plot.pie() method.

import pandas as pd
import numpy as np
df = pd.DataFrame(3 * np.random.rand(4), index=['a', 'b', 'c', 'd'], columns=['x'])
df.plot.pie(subplots=True)
Its output is as follows −

Pavitra Degree College – B.Sc.(Computers) III Year VI sem Page 191

Page 194

You might also like