You are on page 1of 23

ST. ANDREWS SCOTS SR. SEC.

SCHOOL
I.P Extension, Patparganj, Delhi, 110092
Class XII
Data Handling Using Pandas
Introduction to Python Libraries:

NumPy, Pandas and Matplotlib are three Python libraries which are used for scientific and
analytical use. These libraries allow us to manipulate, transform and visualise data easily and
efficiently.
NumPy, which stands for ‘Numerical Python’, is a library that can be used for numerical data
analysis and scientific computing.

PANDAS stands for PanelData is a high level data manipulation tool used for analysing data.
It is built in packages like NumPy and Matplotlib and gives us a single, convenient place to do
most of our data analysis and visualisation work. Pandas has three important data structures.
1. Series
2. DataFrame
3. Panel
The Matplotlib library in Python is used for plotting graphs and visualisation. Using Matplotlib,
we can generate publication quality plots, histograms, bar charts etc.

Differences between Pandas and Numpy


Pandas Numpy

It can create Series and DataFrame It creates arrays

Pandas DataFrame can have different data types


A Numpy array requires homogeneous data.
(float, int, string, datetime, etc.).

Pandas is used when data is in tabular format Numpy is used for numeric array based data

Pandas is used for data analysis and visualizations. NumPy is used for numerical calculations.

Installing Pandas:

Command used to install Pandas is given below

pip install Pandas

NOTE : Pandas can be installed only when Python is already installed on that system.
Data Structures in Pandas:

A data structure is collection of data values and operations that can be applied to that data.
Two commonly used data structures in Pandas are:
1. Series
2. DataFrame

Series:

A Series is a one-dimensional array containing a sequence of values of any data type (int, float,
list, string, etc). By default Series have numeric data labels starting from zero. The data label
associated with a particular value is called its index. We can also assign values of other data
types as index. Example of a series containing names of Cities is given below.

Index Data

0 Delhi
1 Faridabad
2 Jaipur
3 Mumbai
4 Bangalore

Here the index value is numeric

Example of a series containing names of Fruits is given below:

Index Data

0 Mango
1 Guava
2 Banana
3 Grapes
4 Water melon

Here the index value is numeric

Example of Series containing month name as index and number of days as Data:

Index Data

Jan 31
Feb 28
Mar 31
April 30
May 31

Here the index value is String

How to create Series:

A Series in Pandas can be created using Series( ) method.

1. Creation of empty Series

>>>import pandas as pd
>>>s1 = pd.Series( )
>>>s1
Series([ ], dtype: float64)

There are different ways in which a series can be created in Pandas.

2. Creation of Series using List

A Series can be created using list as shown in the example below:

>>> import pandas as pd #import Pandas with alias pd


>>> s1 = pd.Series([2, 4, 8, 12, 14, 20]) #create a Series using list
>>> print(s1) #Display the series

OUTPUT

0 2
1 4
2 8
3 12
4 14
5 20
dtype: int64

Observe that output is shown in two columns – the index is on the left and the data value is
on the right.

We can also assign a user-defined labels to the index and use them to access elements of a
Series. The following example has a numeric index in random order

>>> series2 = pd.Series(["Raman","Rosy","Ram"], index=[1, 7, 9])


>>> print(series2) #Display the series
OUTPUT
1 Raman
7 Rosy
9 Ram
dtype: object

Here, data values Raman, Rosy and Ram have index values 1, 7 and 9 respectively

We can also use letters or strings as indices, for example:

>>> import pandas as pd


>>> S2 = pd.Series([2,3,4],index=["Feb","Mar","Apr"])
>>> print(S2) #Display the series

OUTPUT
Feb 2
Mar 3
Apr 4
dtype: int64

Here, data values 2, 3 and 4 have index values Feb, Mar and Apr respectively

3. Creation of Series using NumPy Arrays

We can create a series from a one-dimensional (1D) NumPy array, as shown below:

>>> import numpy as np # import NumPy with alias np


>>> import pandas as pd
>>> a1 = np.array([6, 4, 8, 9])
>>> s3 = pd.Series(a1)
>>> print(s3)

Output:
0 6
1 4
2 8
3 9
dtype: int32

We can also use letters or strings as indices. for example

>>> import numpy as np # import NumPy with alias np


>>> import pandas as pd
>>> a1 = np.array([6, 4, 8, 9])
>>> s3 = pd.Series(a1, index = ['a', 'b', 'c', 'd'])
>>> print(s3)
Output:
a 6
b 4
c 8
d 9
dtype: int64

When index labels are passed with the array, then the length of the index and array must be
of the same
size, else it will result in a ValueError like shown below

>>> import numpy as np # import NumPy with alias np


>>> import pandas as pd
>>> a1 = np.array([6, 4, 8, 9])
>>> s3 = pd.Series(a1, index = ['a', 'b', 'c', 'd', 'e'])
>>> print(s3)

OUTPUT:
ValueError: Length of values (4) does not match length of index (5)

4. Creation of Series from Dictionary:

When a series is created from dictionary then the keys of the dictionary becomes the index
of the series, so no need to declare the index as a separate list as the built-in keys will be
treated as the index of the series. Let we do some practicals.

Practical 1: Pass the dictionary to the method Series()

import pandas as pd
S2 = pd.Series({2 : "Feb", 3 : "Mar", 4 : "Apr"})
print(S2) #Display the series

OUTPUT:
2 Feb
3 Mar
4 Apr
dtype: object

NOTE: In above example, you can see that keys of a dictionary becomes the index of the
Series
Practical 2: Store the dictionary in a variable and pass it the variable to method Series()

import pandas as pd
d = {"One" : 1, "Two" : 2, "Three" : 3, "Four" : 4}
S2 = pd.Series(d)
print(S2) #Display the series

OUTPUT:
One 1
Two 2
Three 3
Four 4
dtype: int64

Practical 3: Lets try to pass index while creating Series from Dictionary

import pandas as pd
d = {"One" : 1, "Two" :2, "Three" : 3, "Four" : 4}
S2 = pd.Series(d, index=["A", "B", "C", "D"])
print(S2)

OUTPUT:
A NaN
B NaN
C NaN
D NaN
dtype: float64

5. Creation of Series using mathematical expressions:

import pandas as pd
d = [12, 13, 14, 15]
S2 = pd.Series(data = [d]*2, index = d)
print(S2) #Display the series

OUTPUT
ValueError: Length of values (2) does not match length of index (4)
import pandas as pd
d = [12, 13, 14, 15]
S2 = pd.Series(data=[d]*4, index=d)
print(S2) #Display the series

OUTPUT
12 [12, 13, 14, 15]
13 [12, 13, 14, 15]
14 [12, 13, 14, 15]
15 [12, 13, 14, 15]
dtype: object
6. Creation of Series using String:

Practical 1:

import pandas as pd
S2 = pd.Series('a', 'b', 'c')
print(S2) #Display the series

OUTPUT

0 a
1 b
2 c

Practical 2:

import pandas as pd
S2 = pd.Series('anil', 'bhuvan', 'ravi')
print(S2) #Display the series

OUTPUT

0 anil
1 bhuvan
2 ravi

Practical 3:

import pandas as pd
S2 = pd.Series('anil', 'bhuvan', 'ravi', index = [1, 4, 7])
print(S2) #Display the series

OUTPUT

1 anil
4 bhuvan
7 ravi

How to modify the index value of the existing Series:

We can change the existing index value of the Series by using index method:

import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry)
seriesCapCntry.index=[10,20,30,40] #this statement is used to change the
index of the Series.
print(seriesCapCntry)

OUTPUT
India NewDelhi
USA WashingtonDC
UK London
France Paris
dtype: object

10 NewDelhi
20 WashingtonDC
30 London
40 Paris
dtype: object

How to access elements of the Series:

There are two common ways for accessing the elements of a series: Indexing and Slicing.

1. Indexing :

Indexing in Series is used to access elements in a series. Indexes are of two types:
 Positional index
 Labelled index.

Positional index takes an integer value that corresponds to its position in the series
starting from 0, whereas

Labelled index takes any user-defined label as index.

Lets do some practicals of accessing elements of Series using Positional index

Practical 1: Accessing single element from the series.

import pandas as pd
d = [31, 15, 17, 20]
S2 = pd.Series(d)
print(S2[2]) #Display the third value of the series using it's index value
print(S2[0]) #Display the first value of the series using it's index value
OUTPUT:
17
31

Practical 2: What happen if we type wrong Series name to access an element.

import pandas as pd
d = [31, 15, 17, 20]
S2 = pd.Series(d)
print(s2[2]) #Wrong Series name

OUTPUT:
NameError: name 's2' is not defined

Practical 3: What happen if we give wrong index to access an element.

import pandas as pd
d = [31, 15, 17, 20]
S2 = pd.Series(d)
print(S2[5]) #Wrong index

OUTPUT:

KeyError: 5

Practical 4: What happen if we give negative index to access an element.

import pandas as pd
d = [31, 15, 17, 20]
S2 = pd.Series(d)
print(S2[-1]) #Negative index

OUTPUT:

KeyError: -1

The above error can be rectify by adding index as shown below

import pandas as pd
d = [1, 2, 3]
S2 = pd.Series(d, index=["One", "Two", "Three"])
print(S2[-1])

OUTPUT:
3
Practical 5: What happen if we give negative index(enclosed in square brackets) to access an
element.

import pandas as pd
d = [1, 2, 3]
S2 = pd.Series(d, index=["One", "Two", "Three"])
print(S2[[-1]])

OUTPUT:

Three 3
dtype: int64

Lets do some practicals of accessing elements of Series having index value.

In the following example, value NewDelhi is displayed for the labelled index India.

import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry['India']) #Using Labelled index

OUTPUT:
NewDelhi

We can also access an element of the series using the positional index:

import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry[1])

OUTPUT:
WashingtonDC

More than one element of a series can be accessed using a list of positional integers.

import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry[[1, 2]])

OUTPUT:
USA WashingtonDC
UK London
dtype: object

More than one element of a series can also be accessed using a list of index labels as shown
in the following examples:

import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry[['India', 'UK']]) #Accessing Multiple elements using index
labels

OUTPUT:
India NewDelhi
UK London
dtype: object

2. Slicing :

Sometimes, we may need to extract a part of a series. This can be done through slicing. This
is similar to slicing used with List. When we use positional indices for slicing, the value at the
end index position will be excluded. for example

import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry[0:2]) #Here the value at index 0 and 1 will be extracted

OUTPUT:
India NewDelhi
USA WashingtonDC
dtype: object

Let we take another example

import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry[0:1]) #Here the value at index 0 will be extracted

OUTPUT:
India NewDelhi
dtype: object
import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry[-1 : -3 : -1]) #Here the value at index 0 will be extracted

OUTPUT:

France Paris
UK London
dtype: object

We can also get the series in reverse order, for example:

import pandas as pd
seriesCapCntry = pd.Series(['NewDelhi', 'WashingtonDC', 'London', 'Paris'],
index=['India', 'USA', 'UK', 'France'])
print(seriesCapCntry[: : -1]) #Here the series will be extracted in reverse order

OUTPUT:

France Paris
UK London
USA WashingtonDC
India NewDelhi
dtype: object

If labelled indexes are used for slicing, then value at the end index label is also included in
the output, for example:

Practical 1:

import pandas as pd
S2 = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"])
print(S2["Two" : "Five"])

OUTPUT:
Two 2
Three 3
Four 4
Five 5
dtype: int64

Practical 2:

import pandas as pd
S2 = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"])
print(S2["One" : "Three"])
OUTPUT:

One 1
Two 2
Three 3
dtype: int64

How to modify the elements of the Series:

We can modify the values of series elements by assigning the value to the keys of the series
as shown in the following example:

Example 1:

import pandas as pd
S2 = pd.Series([1,2,3,4,5], index=["One","Two","Three","Four","Five"])
S2["Two"] = 22
print(S2)

OUTPUT:
One 1
Two 22
Three 3
Four 4
Five 5
dtype: int64

Example 2:

import pandas as pd
S2 = pd.Series([1,2,3,4,5], index=["One","Two","Three","Four","Five"])
S2["Two", "Three"] = 22
print(S2)

OUTPUT:
One 1
Two 22
Three 22
Four 4
Five 5
dtype: int64
Example 3:

import pandas as pd
S2 = pd.Series([1,2,3,4,5], index=["One","Two","Three","Four","Five"])
S2[1 : 4] = 22 #we can use slicing to modify the value
print(S2)

OUTPUT:
One 1
Two 22
Three 22
Four 22
Five 5
dtype: int64

Observe that updating the values in a series using slicing also excludes the value at the end
index position.
But, it changes the value at the end index label when slicing is done using labels.

Example 4:

import pandas as pd
S2 = pd.Series([1,2,3,4,5], index=["One","Two","Three","Four","Five"])
S2["One" : "Four"] = 22
print(S2)

OUTPUT:

One 22
Two 22
Three 22
Four 22
Five 5
dtype: int64
Practice Exercise :

Q1. Consider the following Series and write the output of the following:

import pandas as pd
S = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"])
a. print(S[“Two”])
b. print(S[2])
c. print(S[0])
d. print(S[0 : 2])
e. print(S[“Two” : “Three”])
f. print(S[: : -1])
g. print(S[1 : 4])
h. print(S[[“Two”, “Four”]])
i. print(S[“Two” : “Four”])
j. print(S[-1])
SOLUTIONS
a. 2

b. 3

c. 1

d.

One 1
Two 2
dtype: int64

e.

Two 2
Three 3
dtype: int64

f.

Five 5
Four 4
Three 3
Two 2
One 1
dtype: int64

g.

Two 2
Three 3
Four 4
dtype: int64

h.

Two 2
Four 4
dtype: int64

i.

Two 2
Three 3
Four 4
dtype: int64

j.

5
Attributes of Series:

We can access various properties of a series by using its attributes with the series name.
Syntax of using attribute is given below
<Series Name>.<Attribute Name>

Few attributes of Pandas Series are given in the following table:


Attribute Name Purpose

name This attribute assigns a name to the Series.

index.name It assigns a name to the index of the series

values This attributes prints all the values of the series in the form of list.

size This attribute prints the number of values in the Series.

empty prints True if the series is empty, and False otherwise

index It returns the index of the series.

hasnans It returns “True” if series has any NaN

Practice Exercise of Series Attributes

Example 1: Demonstration of “name” attribute

import pandas as pd
S = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"])
print(S)
print("------------------------------------------")
S.name="Sample"
print(S)
OUTPUT

One 1
Two 2
Three 3
Four 4
Five 5
dtype: int64
------------------------------------------
One 1
Two 2
Three 3
Four 4
Five 5
Name: Sample, dtype: int64

Example 2: Demonstration of “index.name” attribute

import pandas as pd
S = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"])
print(S)
print("------------------------------------------")
S.index.name="Number"
print(S)
OUTPUT:

One 1
Two 2
Three 3
Four 4
Five 5
dtype: int64
------------------------------------------
Number
One 1
Two 2
Three 3
Four 4
Five 5
dtype: int64

Example 3: Demonstration of “values”, “size” and “empty” attribute

import pandas as pd
S = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"])
print(S)
print("------------------------------------------")
print(S.values)
print("------------------------------------------")
print(S.size)
print("------------------------------------------")
print(S.empty)

OUTPUT:
One 1
Two 2
Three 3
Four 4
Five 5
dtype: int64
------------------------------------------
[1 2 3 4 5]
------------------------------------------
5
------------------------------------------
False

Example 4: Demonstration of “index” and “hasnans” attribute

import pandas as pd
S = pd.Series([1, 2, 3, 4, 5], index=["One", "Two", "Three", "Four", "Five"])
print(S)
print("------------------------------------------")
print(S.index)
print("------------------------------------------")
print(S.hasnans)

Methods of Series:

In this section, we are going to discuss methods available for Pandas Series. Let us consider
the following Series.

import pandas as pd
S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six",
"Seven"])
print(S)

1. head(n): This method returns the first n members of the series. If the value for n is not
passed, then by default first five members are displayed. for example

import pandas as pd
S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six",
"Seven"])
print(S.head(3)) #Display First three members of Series
print("------------------------------------------------")
print(S.head( ))#Display First five members of Series as no argument is passed
OUTPUT:

One 1
Two 2
Three 3
dtype: int64
------------------------------------------------
One 1
Two 2
Three 3
Four 4
Five 5
dtype: int64

2. tail(n): This method returns the last n members of the series. If the value for n is not
passed, then by default last five members will be displayed. for example

import pandas as pd
S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six",
"Seven"])
print(S.tail(4)) #Display last four members of Series
print("------------------------------------------------")
print(S.tail( ))#Display last five members of Series as no argument is passed
OUTPUT:

Four 4
Five 5
Six 6
Seven 7
dtype: int64
------------------------------------------------
Three 3
Four 4
Five 5
Six 6
Seven 7
dtype: int64

3. count( ): This method returns returns the number of non-NaN values in the Series. for
example

import pandas as pd
S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six",
"Seven"])
print(S.count())
OUTPUT:
7

Accessing values of Series using conditions:

We can display particular values of Series using conditions for example:

import pandas as pd
S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six",
"Seven"])
print([S>5]) #will return True for those values of Series which satisfy the
condition
print("----------------------------------------")
print(S[S>5]) #will return those values of Series which satisfy the condition
OUTPUT:

[One False
Two False
Three False
Four False
Five False
Six True
Seven True
dtype: bool]
----------------------------------------
Six 6
Seven 7
dtype: int64

Deleting elements from Series :

We can delete elements from Series using drop( ) method. To delete a particular element
we have to pass the index of the element to be deleted.

Syntax of drop( ) method:


<Series name>.drop(index, inplace = True/False)

Example 1: To delete a particular element from Series

import pandas as pd
S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six",
"Seven"])
print(S)
print("-------------------------------------------")
print(S.drop("Four"))#This statement will delete only one element.
OUTPUT:

One 1
Two 2
Three 3
Four 4
Five 5
Six 6
Seven 7
dtype: int64
-------------------------------------------
One 1
Two 2
Three 3
Five 5
Six 6
Seven 7
dtype: int64

Example 2: To delete more than one element from Series.

import pandas as pd
S = pd.Series([1, 2, 3, 4, 5, 6, 7], index=["One", "Two", "Three", "Four", "Five", "Six",
"Seven"])
print(S)
print("-------------------------------------------")
print(S.drop(["Four", "Five"]))#This statement will delete two elements.
One 1
Two 2
Three 3
Four 4
Five 5
Six 6
Seven 7
dtype: int64
-------------------------------------------
One 1
Two 2
Three 3
Six 6
Seven 7
dtype: int64

You might also like