Professional Documents
Culture Documents
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
Unit I: Data Handling Using Pandas and Data Visualization: Marks:30
• NumPy, Pandas and Matplotlib are three well-established Python libraries. These
libraries allows us to manipulate, transform and visualize data easily and efficiently.
Output:
Series([], dtype: float64)
Creating a series using Series() method with Arguments
A series is created using Series() method by passing index and data elements as
the arguments to it.
Syntax:
<Series object> = pandas. Series(data, index =idx)
* series output has 2 columns index on left and data value is on right. If we don’t
specify index, default index will be taken from 0 to N-1.
Create a Series using List:
# Example 2: creating a series using Series() with List as an argument
>>> import pandas as pd
>>> s1 = pd. Series([10,20,30,40])
>>> s1
( or )
>>> print(s1)
Output:
0 10
1 20
2 30
3 40
dtype: int64
Creating a series using range method
>>>import pandas as pd
>>> s1 = pd.Series(range(5))
>>> print(s1)
0 0
1 1
2 2
3 3
4 4
dtype: int64
Creating a series with explicit index values:
>>> import pandas as pd
>>> s1 = pd. Series( [10, 20, 30, 40, 50], index = ['a’, 'b',’ c',’ d',’ e’] )
>>> print(s1)
a 10
b 20
c 30
d 40
e 50
dtype: int64
Creating a Series from ndarray
Without index Argument
>>> import pandas as pd
>>> import numpy as np
>>> data = np. array (['a’, 'b’, 'c’, 'd'])
>>> s1 = pd.Series(data)
>>> print(s1)
Output:
0 a
1 b
2 c
3 d
dtype: object
Creating a Series from ndarray
With index Argument
>>> import pandas as pd
>>> import numpy as np
>>> data = np. array (['a’, 'b’, 'c’, 'd’])
>>> s1 = pd.Series( data, index=[100,101,102,103] )
>>> print(s1)
Ouput:
100 a
101 b
102 c
103 d
dtype: object
Create a Series from dict
Eg.1(without index)
>>> import pandas as pd
>>> data = {'a':0,'b':1,'c':2}
>>> s1 = pd.Series ( data)
>>> print(s1)
Output:
a 0
b 1
c 2
dtype: int64
Eg.2 (with index)
>>> import pandas as pd
>>> data = {'a':0,'b':1,'c':2}
>>> s1 =pd.Series( data, index= ['b' ,'c', 'd' ,'a'])
>>> print(s1)
Output:
b 1.0
c 2.0
d NaN Not a Number
a 0.0
dtype: float64
Create a Series from Scalar
>>> import pandas as pd
>>> s1 =pd.Series(5, index=[1,2,3,4])
>>> print(s1)
Output:
1 5
2 5
3 5
4 5
dtype: int64
Note :- here 5 is repeated for 4 times (as per no of index)
Creating a series using arange method of numpy
>>> import pandas as pd
>>> s1=pd.Series(np.arange(10,16,1),index=['a','b','c','d','e','f'])
>>>print(s1)
a 10
b 11
c 12
d 13
e 14
f 15
dtype: int32
Accessing elements of a series
* There are 2 methods indexing and slicing
A) Indexing
Two types of indexes are: positional index and labelled index. Positional indexing
is default index starting from 0, whereas labelled index is user defined index.
Example 1:
>>> import pandas as pd
>>>s1 = pd.Series([ 10, 20,30, 40,50])
>>>print(s1[2] )
30
Example 2:
>>> import pandas as pd
>>>s1 = pd.Series([ 10, 20,30, 40,50],index = ['a','b','c','d','e'])
>>> print(s1['d'] )
40
>>> s1[['a','c','e’]]
(or)
>>> print(s1[['a','c','e']])
Output:
a 10
c 30
e 50
dtype: int64
Example 3:
>>>import pandas as pd
>>>sercap=pd.Series([‘NewDelhi’,’London’,’Paris’],
index=[‘India’,’UK’,’France’])
>>>print(sercap[[‘UK’,’France’]])
>>>sercap[‘India’]
or UK London
France Paris
>>>print(sercap[‘India’]) dtype: object
NewDelhi
How to assign new index values to series
>>>sercap.index=[10,20,30]
>>>print(sercap)
10 NewDelhi
20 London
30 Paris
dtype: Object
B) Slicing
• Similar to slicing with NumPy arrays
• Slicing can be done by specifying the starting and ending parameters.
• In positional index the value at the end index position is excluded.
Example:
>>>import pandas as pd
>>>sercap=pd.Series([‘NewDelhi’, ’WashingtonDC’, ’London’, ’Paris’], index=[‘India’,
’USA’, ’UK’, ’France’])
>>>print(sercap[1:3])
output
USA WashingtonDC
UK London
dtype: object
Example using labelled index
>>>import pandas as pd
>>>sercap=pd.Series([‘NewDelhi’, ’WashingtonDC’, ’London’, ’Paris’],
index=[‘India’, ’USA’, ’UK’, ’France’])
>>>print(sercap[‘USA’: ‘France’])
USA WashingtonDC
UK London
France Paris
dtype: object
Series in reverse order slicing
>>> import pandas as pd
>>> sercap=pd.Series(['NewDelhi','WashingtonDC','London','Paris'],
index=['India','USA','UK','France'])
>>>print(sercap[: : -1])
France Paris
UK London
USA WashingtonDC
India NewDelhi
dtype: object
How to modify the values of series using slicing
>>> import pandas as pd
>>> import numpy as np
>>> s1=pd.Series(np. arange(10,16,1),index=['a','b','c','d','e','f'])
>>> s1[1:3]=50
>>> print(s1)
a 10
b 50
c 50
d 13
e 14
f 15
dtype: int32
Example 2: using index label
>>> import pandas as pd
>>> import numpy as np
>>> s1=pd.Series(np. arange (10,16,1),index=['a', 'b', 'c', 'd', 'e‘ ,'f'])
>>> s1['c' :'e']=500
>>> print(s1)
a 10
b 11
c 500
d 500
e 500
f 15
dtype: int32
Accessing Data from Series with indexing and slicing( using position)
e.g. import pandas as pd
>>> s1 = pd.Series([11, 12 ,13 ,14,15],index=[ 'a',’ b’, 'c’, 'd’, 'e'])
>>> print(s1[0]) >>>print(s1[‘a’])
11
>>> print(s1[:3])
a 11
b 12
c 13
dtype: int64
>>> print(s1[-3:])
c 13
d 14
e 15
dtype: int64
In the first statement the element at ‘0’ position is displayed.
In the second statement the first 3 elements from the list are displayed.
In the third statement last 3 index values are displayed because of negative indexing.
Retrieve Data from selection :
• loc is used for indexing or selecting based on name, i.e., by row name and
• iloc is used for indexing or selecting based on position , i.e., by row number
Output: >>>print(s1.loc[49:47])
49 NaN
48 NaN
47 NaN
dtype: float 64
e.g.2 >>> import pandas as pd
>>> import numpy as np
>>> s1 = pd.Series( np. nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>>print(s1. loc[ 49 : 1] ) # selects the data according to the index name
Output:
49 NaN >>>print(s1.iloc[ :6])
48 NaN
47 NaN
46 NaN
45 NaN
1 NaN
dtype: float 64
Conditional Filtering Entries:
>>> import pandas as pd
>>> s1 = pd. Series([1.00000,1.414214,1.730751,2.000000])
>>> print(s1) >>> print(s1 < 2)
Output: Output :
0 1.000000 0 True
1 1.414214 1 True
2 1.730751 2 True
3 2.000000
3 False
dtype: float64
dtype: bool
>>>print(s1>=2)
Note :
>>>print(s1 [s1>=2]) • In the statement s <2 , it performs a vectorized operation
Output: which checks every element in the series.
3 2.0 • In the statement s1[s1>=2] it performs filtering operation
dtype: float64 and returns filter result whose values return True for the
>>> print(s1 [s1 < 2]) expression.
Output:
0 1.000000
1 1.414214
2 1.730751
dtype: float64
Conditional Filtering Entries
Filtering entries from a series object can be done using expressions that are of
Boolean type.
<Series object> [ <Boolean expression on series object>]
Example:
Series object s11 stores the charity contribution made by each section
A 6700
B 5600
C 5000
D 5200
Write a program to display which section contributed more than Rs. 5500
Output:
Contribution >5500 are:
A 6700
B 5600
dtype: int64
Program:
>>> import pandas as pd
>>> s11= pd.Series([6700,5600,5000,5200],index=['A','B','C','D'])
>>> print("Contribution >5500 are:")
>>> print(s11[s11>5500])
Output:
Contribution >5500 are:
A 6700
B 5600
dtype: int64
Sorting Series values:
Series object can be sorted based on values and indexes.
• head()
• tail()
• count()
• Series .head () is a series function that fetches first ‘n’ from a Pandas object.
• By default it gives the top 5 rows of the series.
• Series. tail () is a series function displays the last five elements by default.
Example 1: Example 2:
>>>import pandas as pd >>>import pandas as pd
>>> s1=pd.Series([1,2,3,4,5],index=['a','b','c','d','e']) >>> s1= pd.Series([1,2,3,4,5],index=['a','b','c','d','e'])
>>> print(s1.head(3)) >>> print(s1.head())
output output
a 1 a 1
b 2
b 2 c 3
c 3 d 4
e 5
dtype: int64 dtype: int64
Pandas tail () function:
>>>import pandas as pd
>>>import pandas as pd >>> s1= pd.Series([1,2,3,4,5],index=['a','b','c','d’,’e])
>>> s1= pd.Series([1,2,3,4,5],index=['a','b','c','d’,’e]) >>> print(s1.tail())
>>> print(s1.tail(2)) Output:
Output: a 1
d 4 b 2
e 5 c 3
dtype: int64 d 4
e 5
dtype: int64
pandas count() function:
>>> print(s1.count())
>>> print(s1.count())
output
output
5 4
Homework
Consider the following code:
>>> import pandas as pd
>>> import numpy as np
>>> s1=pd.Series([12,np.nan,10])
>>> print(s1)
Find the output and write a python statement to count and display only non null
values in the above series.
Output
0 12.0 ii) >>> s1.count()
2
1 NaN
2 10.0
dtype: float64
Series Object Attributes:
Properties of a series through its associated attributes.
1) Series. index returns index of the series
2) Series. values returns ndarray
3) Series. dtype returns dtype object of the underlying data.
4) Series. shape returns tuple of the shape of the underlying data.
5) Series. nbytes returns number of bytes of underlying data.
6) Series. ndim returns the number of dimension
7) Series. size returns number of elements.
8) Series. hasnans returns true if there is any NaN
9) Series. empty returns true if series object is empty.
Naming the Series and the index column
>>> import pandas as pd
>>> >>> s1 = pd.Series({'Jan':31,"Feb":28,"Mar":31,"Apr":30})
>>> s1.name="Days"
>>> s1.index.name="Months"
>>> print(s1)
Output:
Months
Jan 31
Feb 28
Mar 31
Apr 30
Name: Days, dtype: int64
>>> import pandas as pd
>>> s1 = pd.Series( range(1, 15, 3), index= [x for x in 'abcde'])
>>> s1.index
Index(['a', 'b', 'c', 'd', 'e'], dtype='object')
>>> s1.values
array([ 1, 4, 7, 10, 13], dtype=int64)
>>> s1.dtype
dtype('int64')
>>> s1.shape
(5,)
>>> s1.nbytes
40
>>> s1.ndim
1
>>> s1.size
5
>>> s1.hasnans
False
>>> s1.empty
False
Sumitha Arora pg no 297 class 11
• Int 8 1 byte
• Int 16 2 bytes
• Int 32 4 bytes
• Int 64 8 bytes
Mathematical operations with Series
e.g.1: e.g.2:
import pandas as pd import pandas as pd
>>> s1 = pd.Series([1,2,3]) >>> s1 = pd.Series([1,2,3])
>>> s2 = pd.Series([1,2,4]) >>> s2 = pd.Series([1,2,4])
>>> s3 = s1+s2 >>> s3 = s1 * s2
>>> print(s3) >>> print(s3)
Output: Output:
0 2 0 1
1 4 1 4
2 7 2 12
dtype: int64 dtype: int64
Mathematical operations with Series
e.g. 4
e.g. 3
>>>import pandas as pd
>>>import pandas as pd
>>> import numpy as np
>>> import numpy as np
>>> s1 = np. arange(10,15)
>>> s1 = np. arange(10,15)
>>> s2 = pd.Series(index= s1, data= s1**4)
>>> s2 = pd.Series(index= s1, data= s1 *4)
>>> print(s2)
>>> print(s2)
Output:
Output:
10 10000
10 40
11 14641
11 44
12 20736
12 48
13 28561
13 52
14 38416
14 56
dtype: int32
dtype: int32
Mathematical operations with Series
e.g. 6
e.g. 5 concat your firstname with your lastname
>>> import pandas as pd >>>import pandas as pd
>>> data =['I','n','f','o','r’] >>> s1 = [ 'a',’ b’, 'c’]
>>> s1 = pd.Series(data+['m','a','t','i','c','s’])
>>> s1 >>> s2 = pd.Series(data= s1 *2)
Output: >>> print(s2)
0 I
1 n Output:
2 f 0 a
3 o
4 r 1 b
5 m 2 c
6 a
7 t 3 a
8 i 4 b
9 c
10 s 5 c
dtype: object dtype: object
Note :
• Arithmetic operations is possible on objects of same index;
otherwise will result as NaN
Homework: