You are on page 1of 25

Python Pandas - I

Outline
1.1 Introduction
1.1 Introduction 1.2 Using Pandas
1.3 Pandas Data Structures

Pandas or Python Pandas is Python's library for data 1.4 Series Data Structure

analysis. Pandas has derived its name from "panel data 1.5 Accessing a Series Object
and its Elements
system", which is an ecometrics term for multi-
1.6 Operations on Series Object
dimensional, structured data sets. Today, Pandas has
1.7 Series Objects vs. 1D Data
become a popular choice for data analysis. As you must
Structures and 1D Numpy Array
be aware of that data analysis refers to process of DataFrame Data Structure
1.8
evaluating big data sets using analytical and statistical 1.9 Creating and Displaying a OF
tools so as to discover useful information and 1.10 DataFrame Attributes
condusions to support business decision-making. 1.11 Dataframe vs. Series and
Pandas makes available various tools for data analysis 2D Numpy Array
and makes it a simple and easy process as compared to 1.12 Selecting or Accessing Data
other available tools. The main author of Pandas is Wes 1.13 Adding/Modifying Rows'/Columns'
McKinney. Values in DataFrames
1.14 Deleting/Renaming Columns/Rows
This chapter will introduce Python's Pandas library, its
1.15 More on DataFrame Indexing -
data structures series and dataframes and some useful Boolean Indexing
functions with these data structures.

: IMPORTA NT Please note that although Pandas is separate library yet it uses NumPy as its ;
: support library and hence many datatypes, constants and functions of NurnPy are frequently :
: used with Pandas. Since NumPy was removed from last year's class 11th syllabus because of :
; COVID situation, students could not study NumPy. Thus, we are giving full NumPy chapter :
.
: of class XI as a SUPPORT MATERIAL in SIPO App. To get this chapter, open SIPO App .
;
: and then Open Support tab to find it. :
............. ............. ... _ .............. .............. .............. .............. ......... ............. .
1.2 Using Pandas

Pandas is an open source, BSD library built for Python programm ing language. Pandas offers
high-performance, easy-to-use data structures and data analysis tools.
In order to work with pandas in Python, you need to import pandas library in your python
environment. You can do this in either on the shell prompt or in your script file (.py) by writing:
'. '. 1
For example ti

")C "'"""--- ■ 81 1
...._, •D Cl D
---"
°""'" 4 Cl
, I (d ef au lt • Jan 16 2018,
Python 3.6.4 !Anaconda, Inc.
10·21:59) [MSC v.1900 32 bit (Intel)]
. ht" , "credits" or "license" for more
. " copyrig
Type
parnus a pd information.

:~;:;: :::,-,,::,:•::•::d I~• Pythoo.


• code for data analys1s
Please note that you can \\ nte · pandas library of Python in a .py file
· usmg
too as you have been writing programs earlier.
hy Pandas ? . .
Pandas is the most popular library in the scientific Python ecosystem for domg data analysis.
Pandas is capable of many tasks including :
c) It can read or write in many different data formats (integer, float, double, etc.).
c) It can calculate in all the possible ways data is organized i.e., across rows and down columns.
¢) It can easily select subsets of data from bulky data sets and even combine multiple datasets
together. It has functionality to find and fill missing data.
¢) It allows you to apply operations to independent groups within the data.
¢) It supports reshaping of data into different forms.
¢) It supports advanced time-series functionality (Time series.forecasting is the use of a model to
predict future values based on previously observed values.)
¢) It supports visualization by integrating matplotlib and seaborn etc. libraries.
In other words, Pandas is best at handling huge tabular data sets comprising different data
formats. Now that you know how to use Pandas on technical platform of your choice, let us
learn the basics of Python Pandas library.

DATA STR UCTURES


1.3 Pandas Data Structures
" Data Structures refer to
In order to work with pandas, you need to learn the basic specialized way of storing data
data structures of Pandas. But hey, do you know what the so as to apply a specific type of
term data structure means ? What? My mistake ? I did not tell functionality on them. ,,
you? Hmm, ohh (Hope you will forgive me;) ) So here I am,
correcting my mistake ©.
A data structure is a particular way of storing and organizing data in a computer to suit a specific
purpose so that it can be accessed and worked with in appropriate ways. Let us make it more clear. For
instance, consider following cases :
¢) You want to store similar type of data items together and process them in identical way e.g.,
you want to store marks scored by students - arrays are the solution for this. Or if you want to
fix one end for insertions or deletions in an array, it will be termed as a stack-datastructure.
¢) There are many more other types of data struchlres suited for different types of functionality.
Depending upon the requirements of the situation, a data stn1cture is decided for that situation.
Pandas offers many data structures to handle variety of data. At the very basic level,
data structures can be thought of somewhat as enhanced versions of NumPy
arrays m which the ~ and mlunmc: can be identified and accessed with labels rather thtm ~
C't'S You will learn about it in this chc1pter. Out of many data structures1 of Pandas, two
basic data structures - Series and DataFrame - are universally popular for their dependability.
Bos c Doto Structures Series and Data Frame
Series IS I-dimensional data stmcture of Python Ptmdas and Dataframe is a 2-dimesional data
structure of Python Pandas. Pa11das also supports another data structure called Panel, but that is
bevond the scope of this book. Thus, our discussion vvill remain limited to Series and DataFrame
data structures only. Following Fig. 1.1 shows a Series and a DataFrame object of Python Pandas.

Columns
Index Data A B C

1 'A' 0 'Hello' 'Column B' NaN

2 'B' )(
C) 1 'NO INFO' 'NO INFO' 'NO INFO'
'C
3 ·c· .E Data
2 'A' 'Column B' NaN

4 'D' 3 'A' 'Column B' NaN

5 'E' 4 'A' 'Column B' NaN

(a) Series object {b) Dataframe object To see


Figure 1.1 Pandas' two basic data structures Pandas' data structures
in action
Before we proceed to the detailed discussion of Pandas Series and
Dataframe, let us talk about the basic difference between the two.
Following table lists the key difference between a Series and
DataFrame object.
Scan
QR Code
T able 1.1 Series vs. DataFrame objects
.__
--
Property Series Dataframe
'
Dimensions I -dimensional 2-dimensional
Homogeneous, i.e., all the elements must be '
Type of data Heterogeneous, i.e., ·a DataFrarne object can
of same data type in a Series object. have elements of different data types.
Mutability • Value mutable, i.e., their elements' value • Value mutable, i.e., their elements' value
can change. can change.
• Size-immutable, i.e., size of a Series object, • Size-mutable, i.e., size of a Datafrarne
once created, cannot change. If you want object, once created, can change in place.
to add/drop an element, internally a new That is, you can add/drop elements in an
Series object will be created. existing datafrarne object.

For working in Pandas, we generally import both pandas and numpy modules/libraries by
giving following import statements on the Python shell or in the script or program code :
. _ _ _ _ You can use any identifier name in place of pd and np but
import pandas as pd ..
pd and 11p have been preferred choices generally
import numpy as np
After this basic introduction of Series and DataFrame objects, let us talk about these individually
irl details.
1. There are other data structures too such as panel but covering panels here is beyond the scope of the book.
4 INFORMATICS PRACTICES

SERIES
14 u
A Series is a Pandas
an import ant data ,tructu re of p.mdas. lt represents a structure that represents a
mens onal arra, ot indexed data. ,-\ So ,cs type object has dimensional array-like
0
containing an array of data ~
main compo nen~ : any NumPy data type) and
◊ an arra, of actual data associated array of data labe:
<) an associated array of inde\.es (Numeric index) or data
label, (Labelled index).
called its index.
,,
'

Both components are one-dimensional arrays '\-vith the same Ieng~. Th~ index
is used to access
individual data values e g., following figure 1.2 shows some sen es obJects :

8ml
Index

1
2
3
Doto

23
18
25
Index

Jan
Feb
Mar
Apr
ml Data

28
31
30
Index

' A ' l~
'B'
'C'
' D'
81
Data

71
61

Figure 1.2 Some Series type objects NO TE

1.4.1 Creating Series Objects When numeric indexes can depict


A Series type object can be created in many ways using pandas position s of data in the object (e.g.,
when indexes go as 0, 1, 2, ...), they
library's Series( ). Make sure that you have import ed pandas
are called positional indexes. Labelled
and numpy modul es with import statements. indexes can take any user-defined labels.

I. Create empty Series Object by using just the Series( ) with no parameter
To create an empty object i.e., having no values, you can just use the Series(
) as :
<Serie s Object> = pandas. ~ Sis in uppercase.

The above statem ent will create an empty Series type object with no value having
defaul t datatype
which is float64. Consider following statement :

>>> obj3 : pd.Seri es() Just Series() will create an


empty Series type object
>>> obj3
Seri es([), dtype: float64 )

II. Creating non-empty Series objects


To create non-em pty Series objects, you need to specify argum ents for data and
indexes as per
following syntax :

<Serie s object >= pd.Ser ies(da ta, index= idx)

where idx is a valid Nump y dataty pe and data is the data part of the Series object,
it can be one
of the following :
¢) A Python sequen ce ¢) An ndarra y
¢) A Python dictionary ¢) A scalar value
J PYTHON PANDAS - I

Following subsections talk about the ways to create Series objects as per above syntax:

,) Specify data as Python Sequence


Simplest" ay to create Series type object is to give a sequence of values as attribute to Series( ),
,.e., as;
<Series Object> = Series ( <any Python sequence>)

H will return an object of Series type. For instance, consider following example statements that
create two Saies type ol1jects using some Python sequences :

range(S) generaes a
Given list is a sequence
sequence[O, 1,2,3, 4]
with 4 values

>>
>>> objl
>>> obj2
String representation of String representation of
0 e Series objectobj1 : 0 3.5 Series objectobj2 :
1 1
1 5.0
2 2 Left column displays Left column displays
2 6.5
3 3 index and right column 3 8.0 index and right column
4 4 displays values dtype: float64 displays values
dtype : int64

As you see above, if you just specify the sequence or just the values with Series( ) and no index,
then by default index array consists of the integers O through N - 1 (where N is the length of
data).

EXAMPL Write code to create a Series object using the Python sequence {4, 6, 8, 10]. Assume that Pandas is
imported as alias name pd.
SOLUTION Output
import pandas as pd series object 1:
sl=pd.Series ([4, 6, 8, 10]) 0 4
print("Series object 1: ") 1 6
print(sl) 2 8
3 10
dtype : i nt64

EXAMPLE Write code to create a Series object using the Python sequence (11, 21, 31, 41). Assume that
Pandas is imported as alias name pd.
SOLUTION Output
import pandas as pd series object 2:
s2 = pd.Series((ll, 21, 31, 41)) 0 11
print ("Series object 2: ") 1 21
print(s2) 2 31
3 41
dtype: int64
6 NFORMAT CS PRAC CE

Output
series object:
s aspd 0 o
s3•pd.Ser ies([ 0 ' h'., 'o'])
1 1
,
1 h
es object:•) 2 0
dtype: object

Series m 11 ,iug a :-triug : "So fimny ". Assume that P1111das 15

so Output
_q,ort pandas as pd series object:
P."'i t(•Series object :•> O so funny
~ • pd.Series("So funny• ) dtype: object
print(s4)

i\nte a program to create a Series object using three different words : "I", "am'', "laughing".
is , ported as alia.:. name pd. Output
SO UTIO
series object:
import pandas as pd 0 I
sS = pd. Series( [ nr, "am", "laughing" ]) 1 am
pr i nt ("Ser iesobject :") 2 laughing
print (s5) dtype: object

(, Spec fy dote os on ndorroy >>>ndal = np .arange (3 , 13, 3.5)


The data attribute can be an ndarray also.
>>> print (ndal }
Consider following code : [ 3. 6. 5 10 . ] ■■ ■■ ■■ • ..........

ndal = np.arange( 3, 13, 3.5)


••••
>>> serl = pd. Series(nda l) \
print(ndal ) ••
••
>>> serl
serl = pd. Series ( ndal) •••
print(serl ) 0 3.0 •••
1 6.5
•·········· ···········
_,. ..-,.~. . ----MIIM
2 10.0
You can also combine the code lines of dtype: float64
above code. Carefully read the following :
serl = pd.series( np.arange{3, 13, 3.5)) TTzis ll'as sJorcd as 11da I in earlier code b111 ,c n
...._ _ _ - u/so m?ate ndarrO) d1rec1/y here too

You can create a Series object from any ndarray created from any function. Consider following
example that does the same.
C er 1 PYTHON PANDAS
O
7
EXAMPLE \'\ rift a progrm 1 '" rrr.'ilfc a S,•1 fr,, ol1jccf 11~i11g ,111 11darmy
that /ins 5 cle111r11b III the range ~4 to 64. Output

SOLUTION 0 24.0
1 34.0
import pandas as pd 2 44.0
s6 = pd.Series (np.linspace(24 , 64, s)) 3 54.0
print(s6) 4 64.0
dtype: fl oat64

EXAMPLE Write a program to create a Series object using an ndarray that is created by tiling a list [3,5] ,
trcice).
Output
SOLUTION
import pandas as pd 0 3
1 5
s7 = pd.Series (np.tile([3,S],2 ))
2 3
print(s7)
3 5
dtype: i nt32

(iii) Specify data as a Python Dictionary


The sequence that you provide with Series() can be any sequence, including dictionaries. Let us
see how you can create a Series object by specifying indexes and values through a dictionary.
(see below)

>>> objS = pd.Series ( { a :31 ,


""- ~

Feb'
- ...............
28 , '",H : 31
.

>>> objS See, Sequence passed to


See, left column took the keys of passed dictionary Series (} is a dictionary
Feb 28
as indexes and values form the data values Notice, curly brackets this time
Jan 31
Mar 31 (Also, notice that indexes are not in the same a::ler
dtype: as given in the dictionary above )

Here, one thing is noteworthy that if you are creating a Series object from a dictionary object,
then keys of the dictionary become index of the Series and the values of dictionary become the
data of Series object. Also, the indexes, which are created from keys may not be in the same
order as you have typed them.

EXAM PLL • Write a program to create a Series object using a dictionary that stores tlze number of students in
each section of class 12 in your school.
Output
SOLUTION
import pandas as pd A 39
B 41
stu = { 'A ' :39, 'B' :41, 'C' :42, 'D': 44}
C 42
s8 = pd. Series ( stu) D 44
print(s8) dtype: int64
INFORMATICS PRACTICES

rv Spec fy do a as a Seo or Vo ue NOTE


The data can be in the fom1 ot ,1 single ,·aluc or a scalar
\ alue. BUT tf data is a scalc1r , aluc. then the iudt•.,· argument If data is a scalar value,
to Series( ) functtlm must be pro,·ided. The scalar \'c1lue Index argument to Se
function must be provided
(gh en as data) "ill be n?pc,1ted to match the length of index.
(The inde_~ argument has to be a sequence of numbers or
labeb of an) type) e.g., consider this ;

medalswon = pd.Seri es(10, index= range(0 , 1))


- This argument will specify the indexes for
series object's elements.

medals 2=pd.S eries(1 S, index= range( l, 6, 2))


ser2 = pd. Series ( 'Yet to start', index = [ 'Indore ', 'Delhi' , 'Shimla '] )

See below the Series ob1ects created as per above statements.


r ........,.......,.,....
-----------~
•Jll9Jll9 ....,.__. - ~ ........

i------
1>>> medalsWon >>>med als2 >>> ser2
I
I
:0 10 1 15 Indor~ Yet to start
I
1 dtype: int64 3 15 Delhi Yet to start
I
1, _ _ll'lllil'IIM~' IIMIIIMll'l ll 5 15 Shimla Yet to start
dtype: int64 dtype: object

EXAMPLE I Write a program to create a Series object that stores the initial budget allocated (50000/- each) for
the four quarters of the year : Qtrl, Qtr2, Qtr3 and Qtr4. Output
SOLUTION
Qtrl 50000
import pandas as pd Qtr2 50000
s9=pd. Series( S0000, index= ('Qtrl' , 'Qtr2', 'Qtr3', 'Qtr4' ]) Qtr3 50000
Qtr4 50000
print(s 9)
dtype: int64

EXAMPLE lid Total number of medals to be won is 200 in the Inter University games held every alternate year.
9.
Write code to create a Series object that stores these medals for games to be held in the decade 2020-202
SOLUTION Output

2020 200
import pandas as pd 200
2022
s10 = pd.Seri es{200, index= range{2 020,202 9,2)) 2024 200
print(s18) 2026 200
2028 200
dtype: int64
1.4.2 Creating Series Objects - Additional Functionality
about
Now that you have a fair idea about how to create Series type objects, let us talk
addition al functionality of Series() that you can use to create Pandas Series objects.

(i) Specifying/Adding NaN values in a Series Object


e data
Sometimes you need to create a series object of a certain size but you do not have complet
value.
available at that time. In such cases, you can fill missing data with a NaN (Not a Number)
9
\ ou l-.111 use np.N,1N to specify a
t:mf h 'aluc ~a~ 1~ detmed m i\llmPv 11wd11/r and henct'
c. ng ,alu ~ or u,e ;'..oue t.~.• -.ee follo\\'ing wd~:
»ob J3=p d.Se ries ( [6.S J np.NaN, 2.34 ])
))) obj3
...... .. •· You can specify
missing or empty value
0 6. 50 ••••••• using np.NaN
NaN 4 ••••••
1
2 2.34 Use np.NaN or None to
dtype: floa t64 add missing data.

Specify mdex(es\ as well as data with Series( )


While creating Series type object is that along with
values, you also prov ide indexes. Both
values and indexes are sequences.
~ These are default values,
if
The syntax for this is as follows : ~ parameters.
/ you skip these
x= None)
<Ser ies Obje ct>= pand as. Seri es (data= None , inde

NOT E
Both data and index have to be
sequences ; None is taken by default, In place of pandas.Series( ), you may use pd.Series( ) also if
you
impor t pandas as pd] or with
if you skip these parameters. have imported pandas as pd [i.e.,
you
the name that you have used with import statement e.g., if
use
Consider following example have given statement as import pandas as pnd then you can
pnd.Series() in place of panda s.Series()
statements :
( A sequence of numbers
>>> arr = (31, 28, 31, 30]

>>> mon = [ 1
r , A < A sequence of strings

= mon) Series object created with both values and indexes


>>> obj 3 = pd. Serie s ( data = arr, index
given as sequences

»> obj3

Jan 31
Series object created with above statement
Feb 28
Mar 31
Apr 30
dtype : int64
35], inde x= [ 14 ', ' '] )
>>>o bj4 = pd.Se ries( data = (32, 34,

>»o bj4
Another Series object created with both values and
A 32
indexes given as sequences
B 34
C 35
dtype : int64 datatype chosen by default

ment will also do the same as above :


You could skip keyword data also, i.e., following state
obj 3 = pd. Serie s (arr, inde x= mon)

You may use loop for defining index sequence also, e.g.,
in 'abcde'])
sl:11 :pd.S eries (rang e(l, 15, 3), inde x= [x for x
INFORMATICS PRACTICE

The abO\ e code " ill crea ll' a . _eril'S object as shm,11 below :

>>> s1 = pd.Serie s( a (1, 15, 3), index=[x for x in ' '])

>>> s1 ~••• •••••••


11
Lcop usedior
specifying indexes
a 1
b 4
C 7
d 10
e 13
dtype: int64

Caunon !
If specifying indexes explicitly using an index sequence, you must provide indexes equal to the number of
-.aiues in data array; providing too few or too many indices will lead to an error - the ValueError.

~M PLE A Python list namely section stores the section nnmes ('A', 'B', 'C', 'D') of class 12 in your school.
Anot~ st contri stores the contribution made by these students to n charity fund endorsed by the school. Write
code to create a Series object that stores the contribution amount as the values and the section names as the indexes.
SOLUTION
Output
import pandas as pd
section = [ 'A' , ' B' , 'C' , , D, ] A 6700
contri = [6700, 5600, 5000, 5200] B 5600
sll = pd .Series(d ata = contri, index= section) C 5000
print(s1 1) D 5200
dtype: int64

(iii) Specify Data Type along with data and index


You can also specify data type along with data and index with Series() as per following syntax :

<Seri es Obj ect>= pandas.S eries(da ta = None, index= None, dtype = None )

None is the default value for different parameters taken in case no value is provided for a
parameter. If you do not specify datatype, the nearest datatype to store the given values will be
taken. But you can specify your own datatype by specifying a NumPy datatype with dtype
attribute.
Series object's indexes are not necessaril y
Consider following statements : to O to n -1 always.

>>> obj S = pd.Ser ies (data = arr, index= mon, dtype = np . float64) ·-···
>>> objS

Jan This timedtype provided.


31.0
Feb 28.0 Compare it with obj3 object that is
Mar 31.0 created above without specifyingdtype
Apr
dtype
Chapter 1 : PYTHON PANDAS - 11

(iv Usmg a Mathemat,col Function Expression to Create Doto Array in Series( )


The Serie..,( J allo" s ' ou tt, dctine c.1 tuncti1.m or exprci.sion that can calculate values for data
sequence. It i, done in the folio\\ ing form :

<Series Object> = pandas. Series ( i ndex = None , data =<function Iexpression> )

To undL•r-,tand ib tunctionmg consider following C\ ample statemen ts.

>>> a = np. arange (9 , 13) NumPy array a created as it


supports vectorized operations
>>> print(a)
[ 9 10 11 12)

~>> obj 7 = pd. Series( index =a, data= a * 2)


Data sequence given as a * 2
>>> obj7 i.e., every value of a doubled

9 18
See, the index 10 20
values taken from 11 22
NumPy array a 12 24
dt ype : int 32

>>> obj8 = pd .Series (index = a, data = a ** 2)


Data sequence given as a * * 2
>>> obj8 i.e., every value of a squared

9 81
10 100
11 121
12 144
dtype: i nt32

The vectorized operations on a NumPy array (e.g., a * 2 or a **2 ) will be applied on every
element of NumPy array and stored as data part of Series object. Now consider another code:

>>> Lst = (9, 10, 11, 12)

>>> obj 8 = pd.Series( data= (2 * Lst) ) Notice this time the expression for

>>> Obj8
••♦•
dqt;) qrrqy involves q list qnd see how
1t hqs impqcted the hnql dqtq qrrqy

0
1
2
3
9
10
11
12
<fi!:j2 ••
4
5
6
9
10
11
•·············
7 12

The above code tries to create data array for Series object obj8 by giving expression 2 * Lst where
Lst is a Py thon list and as you know a number multiplied with a list replicates the list those
many times and hence the data array has the same list values replicated twice (': 2 * Lst).
12 INFORMATICS PRACTICES - XJ

Carefully read the follm, ing example. It is similar yet different from the previous example.

EXAMP Sequrncc, section and co11tri1 store the section names ('A', 'B', 'C', 'D', 'E') and contribution
made by them rc.;pcctfr1dy (6-00, 5b00, 5000. 5200, 11;/) for a charity. Your school has decided to donate as much
co11tributio11 a., made fy each .::cction, i.e., the do11atio11 will be dou[1/cd.
Write rode to crt'afc a Sou·-: object that stores tlzt• co11 tributio 11 amount as the values and the section names as the
index~ with datatype as float32.
SOLUTION
import pandas as pd
import numpy as np This time we have stored the contribution amount in an
nda"ay contrit so that we can double its value through
section= ['A', 'B', 'c', •D•, •E, J ~ vectorized operation to add school's contribution

contril = np.array( [ 6700, 5600, 5000, 5200, np.NaNJ)


s12= pd.Series(data = contril * 2, index= section, dtype = np.float32)
print(sl2) \.
' - - - - This will double each value of the contri1 ndarray. This is
done to reflect the school's contribution to charity.
Output
A 13400.0 NOTE
B 11200.0
C 10000.0 When you store a NaN value in a series
D 10400.0 object, Pandas requires the data type
Compare the data type of this Series object to be of floating point type. Even if you
E NaN with the previous object (sl 1). It is now
specify an integer type, Pandas will
dtype: float32 ..---- different from default data type.
promote it to a floating point type
(automatically) because NaN is not
supported by integer types.
IMPORTANT
While creating a Series object, when you give index array as a sequence then there is no
compulsion for the uniqueness of indexes. That is, you can have duplicate entries in the index
array and Python won't raise any error, e.g., see figure below :
1
>>>val = np.arange( 2.75, 50, 9.75)
See, index array has

, >>>val duplicate entries


array([ 2.75, 12.5, 22.25, 32. , 41 .75] )

>>>ob3 = pd. Series ( val, index = [ ·• ' , ' 1 , ~ , . ., J )


>>>ob3 No error ! Series
object successfully
:a 2.75 converted
'b 12.50
I a 22.25
I 8 32.00 NOTE
b 41. 75
dtype: float64 Indices need not be unique in Pandas
., • ..., . . . . . , illllilt'MiW:WIT.l.l'\WII watt: v-•• •i:1111u• xau u11111:1aa oa't.Aa a .111
Series Object. This will only cause an
error if/when you perform an
operation that requires unique indices
PYTHON PANDAS

~ values of homogeneous tupes end sHH we can


. . , . ...t h . -
• ~ or values. How ?
You have read ull no,, that Senes can st0 h
explore tlus statement httJe more H · ~ omogeneous elements. i.e.• elements of the same type. Let us
· a, e a oak at the following examples :
l . Creating a Senes object that stores three .in teger va Iues

>>> sl = pd.Series( [ll, 12, 13))


>>> s ..
11
l 12 See. t/11s Sl'rit'S object is storing all integer
2 13 ml11e~ and thus, its datatype is one of the
dtype: int64 - - - - - - - - - - integer dawtypes

·
· tegers, an mteger
Smee
. J ffi .being stored are m
all thebvalues datatype is chosen by Pandas (int64 here), so that all
va ues can e e mently stored and processed.
All values are of int64 types _ hence homog eneous datatype.
2. Creating a Series ob1ect that stores th ree fl oa t·ing type va Iues

>>> s2 = pd.Series ( [11 .1, 12 .2, 13 . 31 }


>>> s2
0 11.1
See, this Series object is storing all
1 12. 2
2 13.3 - - - - - - - - - - - - - - floating-pt. values and thus, its
dtype: float64 .... datatype is one of the float datatypes

Since all the values being stored are floating point numbers, a floating-point datatype is chosen by Pandas
(float64 here), so that all values can be efficiently stored and processed.
All values are of float64 types - hence homogeneous datatype.
3. Creating a Serres 061ect that stores a mix of integer and floating type values

>>> s3 = pd . Series( (11, 12 . 2, 13.3))


>>> s3 See, this Series object is storing all numbers -
0 11. 0 integers and floating-pt. values and thus, its
1 12.2 _ _ _ _ _ _ _ _ _ _ _ _ _ _ datatype is one of the float datatypes to
2 13.3 .... accommodate all types of numbers
dtype: fl oat64

Since all the values being stored are numbers - some integers and some floating point numbers, a floating-
point datatype is chosen by Pandas (float64 here), so that all-values can be efficiently stored and processed.
All values are of float64 types - hence homogeneous datatype.
4. Creating a Series object that stores a mix of different type values

>» s4 "'pd.SeriH( (ll, 12.2, 13,3 , "H::." ))


>>> s4 See, this Series object is created from values whose
0 11
l 12 .2 datatypes are different. Thus. Pandas will choose a
2 13.3 datatype (object) which can accommodate all the
3 Hi given values
dtype: object
1

Since all the values being stored have different datatypes (integers, floating point numbers, string etc.),
Pandas will select a datatype, which is cap~ble of holding all ~ese values. Hence, Pandas will choose its
datatype as object, which is capable of holding any type of value.
All values are of object types - hence homogeneous datatype.
14

4 '"""'"'r ,,... _..,.. S To see


senes attnbutes
u c k a Sencs h pe object all mform,1tion re1.ttl'd to it is available in action
l ~ ttnbuk~ \ou can u"e the,e .ittnbute.., in thl' folll wing fomiat to
1

m1allon about the &•ne-. object.


~ s e t>. <attribu t~ name'
Some common attnbute , of Serie, objtx-t ,Hl' listl'd in the table below. The Scan
code example... for the u-.a~c of thc:-c attribute s tollow the Table 1.2. QR Code

l¥ e12 Common attnbutes of <;e, es objects


I _ _:________D_e_s-,r-ip-t-io_n_ _ _ _ _ _ _ _ _ _l
Attnl>ute
<Series object>. index 1 The inde\. (a~s labels) of the Series.
to index
<Series object>.index.name '\ame of the inde\. ; can be used to assign new name
Ii <Series object>. values Return Series as ndarray or ndarray-like dependin g on the dtype
<Series object>.dtype return the dtype object of the underlying data
<Series object>.shape return a tuple of the shape of the underlying data
<Series object>.nbytes return the number of bytes in the underlying data
<Series object>.ndim return the number of dimensions of the underlying data
<Series object>. size return the number of elements in the underlying data
<Series object>. itemsize return the size of the dtype of the item of the underlyin g data
<Series object>.hasnans return True if there are any NaN values; otherwise return False
<Series object>.empty return True if the Series object is empty, false otherwise
<Series object>.name return or assign name to Series object

Followin g example s show how to view various attributes of Series objects.


(a) Retrieving Index Array (index attribute) & Data Array (values attribute) of a Series Object
Ynu can access the index array and data values' array of an existing Series object obj5 as shown
beIO\-, :
'I,; _ _ .,.,,.....
-··
Gt': .,.#
>>>objS . i ndex ~~ - See, <object>.index and
:rndex( ['Feb ', 'J an ' , '11ar'] , dtype= 'object ')
<object>,values returned the index array
and data values' array respectively
,>>>obj S. values~ ; : ,
array([28 , 31, 31 ], dtype=int 64) obj6 was created as :

»>obj6. index'~(- ;:, obj6 = pd.Series( data"' np.arange(11 , 25, 3))


Rangelnd ex(start=0 , stop=S, step=! )
obj7 was created as :
. d ~,--:; obj7 = pd.Series(index =a, data =a * 2)
>>>o bJ·7 . 1n ex \:;~
Int64Inde x([9, 10, 11, 12], dtype='in t64' ) where a is a numpy array ([9,10,11, 12])

For the rest of the attributes (covered in points below), we shall be using followin g two objects
(given as Reference 1.2)
>»obj2= pd.Serie s( [ 3.5, 5., 6.5, 8.]) »> obj3 = pd.Series( [6.5, np.NaN, 2.34] )
>» obj2 >» obj3
0 3.5 0 6.50
1 5.0 1 NaN
2 6.5 2 2.34
3 8.0 dtype: float64
dtype: float64 Reference 1.2
PYT
15
b Setfng 1he Index Nome
B, default Senc, ha, no name for ·its • d
string to its <Series obJ'ecb ind - m exe._
.~, but ,·ou can se1 its
J
· • d
m ex name by ass1gnmg
• . a
· ex.name .1ttnbute, l' \_\l' .,
»>serl=pd.Series( [1, 2, 3], index=('a', 'b', 'c'])
»> serl
a 1
b 2
C 3
>~ > serl. index. name = "newind"
»> serl ••
newind C -J •••••
a 1 -~ ~··············
b 2 ~ See. new name for
the indexes assigned
C 3

(c) Setting Series Name


The <Series>.name attribute can be used to get or set the name of a Series object, e.g.,
»> serl
a 1
b 2
C 3
dtype: int64
»> serl. name = "Mys"
••••••
»> serl ••
a 1
•• ••♦•
b 2 ••
C 3 .(:•• •
••••
Name: Mys, dt ype : int64
» >serl. name
'Mys'

(d) Retrieving Data Type (dtype) and Size of Type {itemsize)


To retrieve the d ata type of individual elements of a series object, use <objectnarne>.dtype.
To know about the type of Series object itself, you can use type() of Python. You can use itemsize
attribute to know the number of bytes allocated to each data item e.g.,

>>> obj2.dtype The dtype attribute return the type of


dtype ( 'float64' ) data values stored in Series object

>>> obj2. i t e msi ze The itemsize returns the size in byte


8 for each item

>>>type (obj2) The type(), however, tells the type of


pandas.core.series.Series the object passed to it
(e) Retriev ng Shope
a Series ob, I (, rape uttnlmte) tdl~ lh1w liis it NOTE
IS IC 'I/ C 1 enh ,t ronf,1111~ 171t l11dl11s ,,,;~~i11s or
empty i 1 es NaN, • Sin~ there is onh one a, is in Series The shape of a series object tellsh
is, i.e., how many elements it
object it 1s shown .i::. (<n>, ) where ,·1 is the number of
including missing or empty values
elements m the object e.g.,

»> print(obj2.shap e, obj3.shape)


(4,)(3,) Set' the 1111mber of elements in objects -
obj2 and objJ are shown.

mRetrieving Dimens on (number of axis ndim attribute),


Size {size attri bute, and Number of Bytes (nbytes attribute)
To know about the dimension (number of axis), use <objectname>.TI<lim.
To know about the number of elements in the Series object, use <objectname>.size.
To know total number of bytes taken by Series object data, use <objectname>.nbytes (nbytes is
equal to the size * itemsize)

>» obj2.ndim
Series object is 1-dimensional object.
1

»> print (obj2.size, obj3.size) obj2 has 4 elements and obj3 has 3 elements
4 3

»> print ( obj2. nbytes, obj3. nbytes) obj2 has 4 elements. hence 4*8 = .J2 bytes and objJ
32 24 +---- - - -- - - has 3 elements, hence 3 * 8 = 24 bytes

(, (g) Checking Emptiness (empty attribute) and Presence of NaNs (hasnans attribute)
See we have created an empty Series object namely objl also and then checked emptiness of
objl along with emptiness of objects obj2 and obj3 of Reference 1.2. (given on previous page).

>>> objl=pd. Series (} -sl~---' The object obj1 is empty >>>obj2.empty


but obj2 and obj3 False
>>> objl. empty (ref 10.2) are not empty
Tl"Ue as they contain some data >>>obj3. empty
.............._.=•• =••··· Fals e

¢) Similarly, to check if a Series object contains some r. X

NaN value or not, you can use hasnans attribute N 0 TE


as shown below (using objects of reference 2.2) If you use len( ) on a Series object, then it
¢) You can use len( ) to get total number of elements returns total elements in it including NaNs
and <series>.count( ) method with Series object to but <series>.count() returns only the cocllC
get the count of non-NaN values in a series object, of non-NaN values in a Series object.
e.g.,:
-- --
l
>>>obj2 . hasnans >>>obj2. count () >» l e,,(obj3)
hasnans attrin\bute tells
4
1
False the presence of NaN 3
I
values and count() returns
>>>Obj3 . hunans
count of non_NaN values
>>>obj3. count() s len() gives total no. of
True ~
2 i.Uiiili:lil
11i1 iilli -=• lilliil Ii ~
(1,aPle' AS -

f,XAMPLE Ctn1,ider Ihe two .. n , o t.'t" ~ ._J1 ,md s12 tli 11 Y<lll rr,·alt'd ;11 t'rnm;1/e,; 11 and 12 respectively.
Print tht attrr utc, o{l oth thN d l ct~ 111 a rt' orl fim 11 n, ,/io,l'/1 /•c/i,ll' .

Attribute name Object s11 Object s12


Data type
Shape
No. of bytes
No. of dimensions
Item size
Has NaNs?
Empty?

SOLUTION
impcrt pandas as pd
# statements here to c,~eate cbjects sll and s12 ·from previous examples
print("Attribute name \t\t Object sll \t Object s12 ")
print("--- \t \ t - - \ t - - ")
print ("Data type ( . dtype) :\t", s11.dtype, '\t\t', s12.dtype)
Xprint("Shape (.shape) : \t", ~11.shape, '\t\t', s12.shape)
print("No. of bytes ( .nbytes) :\t", s11.nbytes, '\t\t', s12.nbytes)
print( "No. of dimensions( .ndim) : \t", s11.ndim, '\t\t', s12,ndim)
)<print("Item size (. itemsize) : \t", s11.itemsize, '\t\t', s12.itemsize)
print("Has NaNs? ( .hasnans) : \t", s11.hasnans, '\t\t', s12.hasnans)
print("Empty? (.empty) :\t", s11.empty, '\t\t', s12.empty)

Output

Attribute name object sll object s12

Data type(.dtype) i nt64 float32


shape (. shape) (4,) (5,)
No . of bytes (. nbytes) : 32 20
No . of dimensions(. ndim) 1 1
Item size (.itemsize) 8 4
Has NaNs? (.hasnans) False True
Empty? (.empty) False False
18

m mu n w,l} s. You
111 1n"l'Sc; ,t
dcm cnts and sh
Ol'l'('c;c; md1v1dual

ob~ l'- (objCi obj6 obj7 .md obj8) (Fi .

9 81
18
20
10 100
11 121
22
24 p2 144
e: int3 2 dtyp e : int3 2

Fi ;:-e 1.3 Some sample pandas Series Objects.


objects sho wn in Fig. 1.3.
th t O\\ ng operation::. we shall be usin g the 5ample

d duo E ements from a Series Object


its inde x in squ are brackets aJo
To md1v1dual elements of a Series object, you can give
w1th 1t nam e,, e., as:
S r s Obj ect na e>[ <va lid index>]
from the objects obj5, obj6, obj? and objB.
foli o" mg figure i;;hows you elements accessed

- - , >>> objS [ 'Fe:b ]

"1o t------•-•- 8
>o J7 9

on c cs r

Seo a I these objects· legal indexes are used


I >>> obj8 (ll]
121
--- 28

to access individual elements of these objects.

s
valid or legal inde xes (i.e., whi ch exist in serie
As) ou sec in above figure, we have use only
object) to access an clement.
ng an index with the Series object will retu
If the Series object has duplicate indexes, then givi
w:
all the entr ies with that index, 1:.g., see belo
With duplicate indexes in

b ob3[
41. 75
lI a Series object, all
entries with the same
index are returned.
b 12.5 8 b
a 22. 25 dtyp._e:
__,,.., ....,. ____
__floa t64 ~

a 32.0 0
b 41. 75 2 is not a legal index forobj7 ( its
dtyp e: floa t64
_,_..r rnr =••= ==
legal indexes are 9 to 12- refer
above ), hence the error.
------------
all last' :
BUT if you try to give an inde x File , line , in c module>
whi ch is not a legal index for a
••..
I obj7[ 2)
I
Series object, it will give you
I
I
'
I
. px1", line 817, i
an erro r. : File "p~n das/_ libs/ hash table _clas s_he lper
table .Int6 4Has hTab le.ge t_ite m
1 pandas. _l1bs .hash
See the adja cent figure. .
KeyError: 2
a,apter J • PYTHON PANDAS
19
l .5.2 Extracting Slices from Senes Object
Like other sequences. n~u c.m e,tract slice too trom a &>rics object to retrieve subsets. Let us see
how you can extract slices trom Series objects. Herc, You need to underst and an importa
thing about slicing. which is that : nt
·
Shcmg rakes place pos1t1011 \\•isc and nor chc inde, wi,;c ,n a series ob1ect.
To underst and this, let us consider the same Series objects as given in Fig. 1.3. Intemally there is a
rosition associated with element - first element gets the position as o, second element gets
the
Position as 1 and so on. Irrespective of their indexes, positions always start with Oand go on
like 1, 2, 3 and so on. (see Fig. 1.4)

obj5
Index Data
obJ6 obj7
Position Position Index Data Index Daa
t
e Feb 28
Position
1 Jan 31
e 0 11 0 9 18
2 Mar 1 1 14 10 20
31 1
2 2 17
2 11 22
3 3 20
3 12 24
4 4 23

( All individual elements have position numbers starting from oonwards i.e., Ofor first element, 1 for 2nd element and so on J
Figure 1.4 Position number associated with each element of Series object

When you have to extract slices, then you need to specify slices as [ start : end : step ] like you
do
for other sequences, but the start and stop signify the positions of elements not the indexes
.
Consider following examples :

>>>obj5 [ 1 : ~>>> obj6 [ 2: 5]


( >>>obj7 (0: : 2 ]

Jan 31 ,2 17
'3 20
9 18
Mar 31 11 22
dtype : i nt64 '4 23
dtype: int32
dtype: i nt32

Irrespective of the indexes, the slices have been extracted posffion wise.

All other rules of slices apply, i.e.,


== -
NOTE
you can specify steps, you can
reverse the elements, the range of A slice object is created from Series object using a syntax of
slice can be outside the range of <Object>[start : end : step], but the start and stop signify the
positions of elements not the indexes. The slice object of a Series
positions etc.
object is also a panda Series type object.

Even though obj7 has


indexes 10, 11, 12 but
the slice object is empty Slice object
because slice is with values
extracted position wise reversed
not index wise I 9 18
iJt~ti,, iot32 -- _
a - -'- a.-w *'-
INFORMATICS PRACTICES_ XII

Co 1-ide.. a Serzi;'j obicct s8 that !:itOrt'., the 1111 11l1er of students in rnclz section°!class 12 (as shown
1

l lol
A 39
B ~1
C 42
D 44
First two ,ection-- have bct?n giren a task ~f selling tickets @ 1001- per ticket as Output
' 1 • (If a ~,xial experimo, f l Vrite code to displav how much the1/ have collected.

SOLUTION import pandas as pd . . Ti cket s amount:


A 3900
print ( '"Tii:kets amount :")
B 4100
print(s8[ : 2] *100)
dtype : i nt64

1.6 Operations on Series Object

A Series is a one-dimensional structure which offers flexibility with storage as well as operations
on it. Let us now talk about how you can perform various types of operations on Pandas Series
objects.

1.6.1 Modifying Elements of Series Object


The data values of a Series objer.t can be easily modified through item assignment, i.e.,

<SeriesObj ect>[ <index>]== <new_dat a_value>


Above assignment will change the data value of the given index in the Series object.
<SerysOJj ~ct>[start : stop] = <nevJ_data_value>

rlbove ass1grnnenc will replace all the values falling in given slice Consider follo-\'ving screenshots:

>>>obl
»E:) •
>>::: obl ...• >>::: obl
••••
11.· ••
0 1.50 0 ~
1. 85 9 " " - 0 1.85 I
1
2
12.75
24.00
1
2
12 . 75 t ' ~
24 . 00
1
2
12. 15
-15.75
rJ;
3 35.25
4 46.50
3 35.25 3 · 15 . 75 ~
J,
dtype: float64
4 46,50 4 46.50 ~
dtype: f l oat64 dt ype : float64
•• = =
-
>>>ob2
= 38;;>--
>>>ob2 .,,
~

., ,
$

a 1.50 , ,
b 12.75 a l.S j,:. /
C 24 . 00 b 380.0 ,
d 35 .25 C 24.0 ~
e
dtype
46 .50
: float64
•aha.ls •. t
d
e
380.0

- --
46 . 5
dtype : float64 ____ ..,
21
e dexes
o rename moews nt a &'11p.., Qbjl'd b, ,1---1gning ,ww index ,lfr,1y lo its
: <Ob ect> mde, = <tw" mde.\ .1rr.1, > ,\ ..,Cl' bekn,· :

F:: :;_-i:~~~:~-.~..-.:::.:·-~----:----~ ----·;·--·1


,
b
C
1.se
12.75
24.00
fil--> •>>>ob2
I
:
: V 1.>• 11....
~

'
l
l :•wx R ~-0
d I
See how the new
e 24 . 0 >.
mdexes get a;;~,gned m 1Y 380 . 0 ~
dt
the oroer as gNen m the !z 46. 5 £~:"'!""'._. ._ _ __
new index array I dtype: float64 N O TE
"'-rr..a.r..N'•w••~"'WI.ArJ'..1'

The :,ize of new index array must match" ' 1th existing index Please note that Series obJect's
values can be modified out size
array' ... size. In other \\ ords, you cannot change the size of a
cannot. So you can say that ~erie;;
Sene::- object by .!:,Signing more or less number of indexes objem are value-mutable but
(:,ee below) :

.. ....··Ii~
size-immutable objects.

···Ii••-···· ~
·-·-···-·-·-·-·-♦•••
;>;>obi ♦• >>> ob2.index= [ -
J I ") ]

i ♦ Traceback t- :
I

''
jV 1.5
iw 380.0
!x 24.0 File "C:\ProgramData\Anaconda3\lib\site·packages\pa ndas\cor e
I
380.0 \interna ls.py", 1 e 3074, i set_axis
jY
jz 46.5 (old_len , new_len))
!dtype: float64 ,
-.i:ri.•.at-•--••R .ii'i,,&1-,i wJ ValueEr ror · Length mismatch: Expected axis has 5 element s, new values have 7
elements ~ us .., ==-· ,....,, ,,,,J

EXAMPLE Consider the Series ol1_ject s13 that stores the contribution of each section, as shown below :

A 6700
B 5600
C 5000
D 5200 Output

Write code to modiftJ the amount of section 'A' as series object after modify, ng amounts:
A 7600
7600 a•1d for sections 'C' and 'D' as 7000. Pnnt the
B 5600
changed ab1ect.
C 7000
SOLUTION D 7000

import pandas as pd
sl3[0] == 7600 -tr to modify section 'A' 's amount
s13[2: J =7000 Jt to 11cdif• sect ior c; 'C' and 'D' 's amount

print( "Series object after modifyi ng a:nounts: ")


pr i nt(sl3)
22

1011( ) Fun hons


TilC #read() function 1.5 used lo fetch first II w ws ffllm a p .mdas object and tail() function
·
aq ro,, s fro'"' a I'.-inda._ \.lbJt.>ct The •wntax to use the~w tuncttons 1· ·

<pa s obJect>.head([n])
Or <pa dasobJect>.tail([n])
It ' ou do not pro, ide .im , .,lul' for 11 tlwn llt'ati() and tail() will return first 5 and last 5
n.~ pecth elv of .1 Panda~ t,bjt'.'ct. Consider below given screenshots :

-
'>>>Ob7 .head( 7 ) >>> ob7 . t ail ( ) >>>ob7. tail(7}

I 4
2.75 t 6 31.85 I 22. 15
2.75 0 2 .75 I
7.60 I 7 36.70 :5 27 . 00
7.60 1 7.60 I ,
12.45 8 41.55 I 0 31. 85
12 . .!5 2 12.45
17.30 9 46.40 :7 36 .70
17.30 3 17.30
22.15 10 51 . 25 8 41. 55
22.15 4 22 . 15
27.00 dt ype: float 64 9 46.40
5 27 .00
31.85 10 51. 25
6 31. 85
36.70 I dtype : fl oat 64
1 dtype: float 64
41.55
46.40
51.25
dtypo?: float64
I L~ l

EXAMPLE A Series object trdata consists ofaround 2500 rows of data. Write a program to print the following
details :
(i) First 100 rows o•''J data (ii) Last 5 rows of data

SOLUTION
import pandas as pd
# trdata object's creation or loading happens here
print(trdata.head(100))
print(trdata . tail())

1.6.4 Vector Operations on Series Objects


Vector operations mean that if you apply a function or expression then it is individually applie
on each item of the object. Since Series objects are built upon NumPy arrays (ndarrays), they als
support vectorized operations, just like ndarrays.
Following examples will make it clear to you.
Suppose we have a Pandas Series object ob2 as shown here :
r----- - ---1
I
I >>>ob2 I
I
I
I a 1.50 I
I
I b 12.75 I
I
1 C 24 . 00 I
35.25 I
d I
e 46 . 50 I
I
1 dtype: float64 1
_ _ _ _ _ _ _ _ _ _ _ _ _ _ ,J
I I

A sample Series object


23

2 '°2J ob2 15 etc.


v
To see
or operot ons
1n action
I
[!li~[!l
\ c,prc ~,ons, the b'l\'"Cn operatmn ,, 111 be cm nc-d out 111
.~~-,~;
r .l
'' 111 be applied to each itl.!m of thl' SenL''- object. [!I .-....
o,,1ng e.,amplc, :
Scan

n---- QR Code

.se
38.2S
2.00
d 2. 2500
105.7S
e
162.5625
n9.50
576.0000
dt) pe. float64
..........ili.'il'iiziafi1111 1242.5625
e 2162. 2500
See each of e expresslOOS though applied on the Senes type obJect but is carried out on dtype: float64
cacti individual i+.em of the Senes ObJect - Vector Operations.

figure 1.5 Vector operations on Series objects

1.6 5 Arithmetic on Series Objects


) ou can perform arithmetic hke addition, subtraction, division etc. with two Series objects and it
v.il1 calculate result on n, o corresponding items of the two objects given in expression BUT it
has a caveat - the operation is performed only on the matching indexes, e.g., if first object has
indexes 0,1,2 then it will perform arithmetic only with objects having 0,1,2 indexes; for all other
indexes, it will producL l\iaN (not a number).
Also, if the data items of the two matching indexes are not compatible for the operation, it will
return NaN (Not a Number) as the result of those operations.
To understand this, consider belO\,\. given (Fig. 1.6)five Series objects obl, ob2, ob3, ob4 and obs
(obl and ob3 have matching indexes ; ob2 and obs have matching indexes; ob4 has some
indexes matching with obl and ob3) :

I
>>>obl >>>ob2
- . . . . . _,. __ ........-.-JIii
>>>ob3 >>>ob4 ----- >>>obs

0 1.85 a 1. 50 0 2.75 0 1.255 •a 1.255


I 1 12 .75 b 12.75 1 12.50 1 5.530 b 5.530
I 2 -15.75 C 24.00 2 22.25 2 9.805 C 9.805
I 3 -15.75 d 35.25 3 32.00 3 14.080 d 14.080
,4 46.50 e 46.50 4 41.75 4 18.355 e 18.355
I dtype: float64 dtype: float64 dtype: float64 5 22.630 dtype: float64
6 26.905
7 3 1.180
dtype: float64 To see
arithmetic in series
in action
Figure 1.6 (a) Some sample Series objects with matching and non-matching indexes

Scan
QR Code
24

retull) h" k .,t


the ,tt1tements (arr) mg out arithmetic operations on obj
n atchmg tnd1.:;\l'- (~'l' below)•

0 5.0875 0. 67272.7 2.755


1 l 1.020000 18 . 280
1 159.3750 I

4:; 4
2
3
- 350.4375
-584.00{n)
I

:4
2
3
-0.707865
-0.492188
1.113772
33.805
49.330
64 . 855
4 1941.3756
dt pe . dt } pe: float64
I
, dtype: floc1t64
~ A iililtl,l"";a •a1flXJ,,V:.,l)#iMCA.,.. ,_

Smee ob'~s 001 and oil3 have matching indexes (both have indexes in the range Oto 4), it successfully carries out
g~-er. an:t:me ~ ope·aoon on corresponding items of matching indexes, i.e., items with index Oof both the objects are
?erlonned the 9"en ~ration and result given for the index o. similarly corresponding items having index 1 of both the
objects are pertorrned the given operation and result given for index 1. and so on.
Same thing is applied for expression ob2 + ob5, ,.e., cor;esponding values of index 'a' are added, similarly
corresponding values of index 'a' are added, and so on.

But if you try to perform operation on objects that have some or all non-matching indexes, then
it will add values of matching indexes, if any and for non-matching indexes of both the objects,
it will return me result as Not a Number i.e., NaN. NaN represents missing data. (see below)
.------ >>>obl + ob4
---,.
>>>obl +ob2
'
.~ 3.105
I Computed the given None of the indexes
0 NaN
1 18.280
-5.S45
I operation for matching
indexes (0, 1, 2, 3, 4 in
'1
2
NaN
fJafJ
matched, because
(0, 1, 2, 3, 4) ofob1
-1. 670
both objects) and NaN ·3 tJaN
do not match with
64.855 for non-matching 4 NaN indexes ('a', 'b', 'c',
tJaN indexes (5, 6, 7 of ob4) ,a NaN 'd', 'e') of ob 2, hence
r~aN b NaM NaN for all the
.c NaN indexes of both the
id NaN objects
~e Na N
dtype: float64
'
Figure 1.6 (b)

NOTE

When you perform arithmetic operatior.s on two Series type obJects, the data rs
aligned on the basis of matching indexes (this is called Data Alignment in Pandas
objects) and then performed arithmetic; for non-overlapping indexes, the
arithmetic operations result as a NaN (Not a Number).

You can store the result of object arithmetic in another object. which will also be a Series objec
i.e., if you give :
»> ob6 = obl + ob3
Then ob6 will also be a Series object (if obl and ob3 are Pandas Series objects).
PYTHON PANDAS 25
i'.: uml,cr vf ,t1tdc11f..., :'11 c/11,scs 11 1111d 1~ i11 three sl ream:.. ('Sc1c11cc', 'Commerce' and 'Humanities')
ll'l Smc, t Nech c11 a11c; 12. i \rite codt? tofi11d total 1111111b,·r of !:>f11tlt'11I~ 111 cla:,se:, 11 and
1 12, stream wise.

SOLUTION
import pandas as pd
: # cr"::il..,_ ... Ser es objo:>cts
ell =pd• Series ( data =[30, 40, 50], index = ['Science ·, •Commerce', 'Humani ties' ] )
c12 "'"pd. Series ( data = [ 37, 44, 45], index= [ 'Science ', 'Commerce', 'Humani ties'])
;; ad ..... "'g t o ohjecl.s to get total no. of students Output
print("T otal no. of students ")
Total no. of students
print(c11 +c12) :4 series objects arithmet ic
Science 67
commerce 84
Humaniti es 95
dtype: int64
/
EXAM PL • Objectl Population stores the details of populatwn in four metro cities of India and Object2
Avglncome stores the total average income reported in previous year in each of these metros. Calculate income per
capita for each of these metro cities. Statement continuation mark. Do not type it while
typing code in a .py file, rather type the whole
SOLUTION statement in single line

import pandas as pd .
Populatio n =pd. Series ( (10927986, 12691836, 4631392, 4328063 ] , \
index = ( 'Delhi ' , 'Mumbai', 'Koikata' , 'Chennai' ] )
/

Avglncome = pd. Series ( [ 72167810927986, 85087812691836, 4226784631392, 5261784328063 ], \


~
\
index= ('Delhi', 'Mumbai', 'Kolkata ', 'Chennai' J)_ Expression to calculate per capita income
___
perCapita = Avgincome / Population ..,._ -
print ( "Populati on in four metro cities ")
print(Po pulation)
print("Av g. Income in four metro cities") r----- ------ ------ ------ -- :
1>>> runfile('p erCapita. py')
print(Avgincome) I
print( "Per Capita Income in four metro cities ") I
1Populatio n in four metro cities
print(per Capita) !Delhi 10927986
:Mumbai 12691836
tholkata 4631392
1Chennai 4328063
The output produced by above file is as sho\o\rn Id .
I type: int64
on the right. tAvg. Incow.e in four metro cities
;oel~i 72167810927986
1Humbai 85087812691836
See how easy it has become to calculate per 1Kolkata 4226784631392
capita income if we have huge data stored :chen~ai 5261784328063
about all the cities and average income etc. 1dtype: int64 1
1
1
I
Per Capita Income in four metro cities I
1Delhi 6.603944e+06
1Mumbai 6.70~137e+06
I
1 Kolkata 9.12638le +05
1Chennai 1.215737e+06
1
dtype: float64
I
------ ------ ------ ------ --

You might also like