Professional Documents
Culture Documents
Outline
1.1 Introduction
1.1 Introduction 1.2 Using Pandas
1.3 Pandas Data Structures
Pandas or Python Pandas is Python's library for data 1.4 Series Data Structure
analysis. Pandas has derived its name from "panel data 1.5 Accessing a Series Object
and its Elements
system", which is an ecometrics term for multi-
1.6 Operations on Series Object
dimensional, structured data sets. Today, Pandas has
1.7 Series Objects vs. 1D Data
become a popular choice for data analysis. As you must
Structures and 1D Numpy Array
be aware of that data analysis refers to process of DataFrame Data Structure
1.8
evaluating big data sets using analytical and statistical 1.9 Creating and Displaying a OF
tools so as to discover useful information and 1.10 DataFrame Attributes
condusions to support business decision-making. 1.11 Dataframe vs. Series and
Pandas makes available various tools for data analysis 2D Numpy Array
and makes it a simple and easy process as compared to 1.12 Selecting or Accessing Data
other available tools. The main author of Pandas is Wes 1.13 Adding/Modifying Rows'/Columns'
McKinney. Values in DataFrames
1.14 Deleting/Renaming Columns/Rows
This chapter will introduce Python's Pandas library, its
1.15 More on DataFrame Indexing -
data structures series and dataframes and some useful Boolean Indexing
functions with these data structures.
: IMPORTA NT Please note that although Pandas is separate library yet it uses NumPy as its ;
: support library and hence many datatypes, constants and functions of NurnPy are frequently :
: used with Pandas. Since NumPy was removed from last year's class 11th syllabus because of :
; COVID situation, students could not study NumPy. Thus, we are giving full NumPy chapter :
.
: of class XI as a SUPPORT MATERIAL in SIPO App. To get this chapter, open SIPO App .
;
: and then Open Support tab to find it. :
............. ............. ... _ .............. .............. .............. .............. ......... ............. .
1.2 Using Pandas
Pandas is an open source, BSD library built for Python programm ing language. Pandas offers
high-performance, easy-to-use data structures and data analysis tools.
In order to work with pandas in Python, you need to import pandas library in your python
environment. You can do this in either on the shell prompt or in your script file (.py) by writing:
'. '. 1
For example ti
")C "'"""--- ■ 81 1
...._, •D Cl D
---"
°""'" 4 Cl
, I (d ef au lt • Jan 16 2018,
Python 3.6.4 !Anaconda, Inc.
10·21:59) [MSC v.1900 32 bit (Intel)]
. ht" , "credits" or "license" for more
. " copyrig
Type
parnus a pd information.
Columns
Index Data A B C
2 'B' )(
C) 1 'NO INFO' 'NO INFO' 'NO INFO'
'C
3 ·c· .E Data
2 'A' 'Column B' NaN
For working in Pandas, we generally import both pandas and numpy modules/libraries by
giving following import statements on the Python shell or in the script or program code :
. _ _ _ _ You can use any identifier name in place of pd and np but
import pandas as pd ..
pd and 11p have been preferred choices generally
import numpy as np
After this basic introduction of Series and DataFrame objects, let us talk about these individually
irl details.
1. There are other data structures too such as panel but covering panels here is beyond the scope of the book.
4 INFORMATICS PRACTICES
SERIES
14 u
A Series is a Pandas
an import ant data ,tructu re of p.mdas. lt represents a structure that represents a
mens onal arra, ot indexed data. ,-\ So ,cs type object has dimensional array-like
0
containing an array of data ~
main compo nen~ : any NumPy data type) and
◊ an arra, of actual data associated array of data labe:
<) an associated array of inde\.es (Numeric index) or data
label, (Labelled index).
called its index.
,,
'
Both components are one-dimensional arrays '\-vith the same Ieng~. Th~ index
is used to access
individual data values e g., following figure 1.2 shows some sen es obJects :
8ml
Index
1
2
3
Doto
23
18
25
Index
Jan
Feb
Mar
Apr
ml Data
28
31
30
Index
' A ' l~
'B'
'C'
' D'
81
Data
71
61
I. Create empty Series Object by using just the Series( ) with no parameter
To create an empty object i.e., having no values, you can just use the Series(
) as :
<Serie s Object> = pandas. ~ Sis in uppercase.
The above statem ent will create an empty Series type object with no value having
defaul t datatype
which is float64. Consider following statement :
where idx is a valid Nump y dataty pe and data is the data part of the Series object,
it can be one
of the following :
¢) A Python sequen ce ¢) An ndarra y
¢) A Python dictionary ¢) A scalar value
J PYTHON PANDAS - I
Following subsections talk about the ways to create Series objects as per above syntax:
H will return an object of Series type. For instance, consider following example statements that
create two Saies type ol1jects using some Python sequences :
range(S) generaes a
Given list is a sequence
sequence[O, 1,2,3, 4]
with 4 values
>>
>>> objl
>>> obj2
String representation of String representation of
0 e Series objectobj1 : 0 3.5 Series objectobj2 :
1 1
1 5.0
2 2 Left column displays Left column displays
2 6.5
3 3 index and right column 3 8.0 index and right column
4 4 displays values dtype: float64 displays values
dtype : int64
As you see above, if you just specify the sequence or just the values with Series( ) and no index,
then by default index array consists of the integers O through N - 1 (where N is the length of
data).
EXAMPL Write code to create a Series object using the Python sequence {4, 6, 8, 10]. Assume that Pandas is
imported as alias name pd.
SOLUTION Output
import pandas as pd series object 1:
sl=pd.Series ([4, 6, 8, 10]) 0 4
print("Series object 1: ") 1 6
print(sl) 2 8
3 10
dtype : i nt64
EXAMPLE Write code to create a Series object using the Python sequence (11, 21, 31, 41). Assume that
Pandas is imported as alias name pd.
SOLUTION Output
import pandas as pd series object 2:
s2 = pd.Series((ll, 21, 31, 41)) 0 11
print ("Series object 2: ") 1 21
print(s2) 2 31
3 41
dtype: int64
6 NFORMAT CS PRAC CE
Output
series object:
s aspd 0 o
s3•pd.Ser ies([ 0 ' h'., 'o'])
1 1
,
1 h
es object:•) 2 0
dtype: object
so Output
_q,ort pandas as pd series object:
P."'i t(•Series object :•> O so funny
~ • pd.Series("So funny• ) dtype: object
print(s4)
i\nte a program to create a Series object using three different words : "I", "am'', "laughing".
is , ported as alia.:. name pd. Output
SO UTIO
series object:
import pandas as pd 0 I
sS = pd. Series( [ nr, "am", "laughing" ]) 1 am
pr i nt ("Ser iesobject :") 2 laughing
print (s5) dtype: object
You can create a Series object from any ndarray created from any function. Consider following
example that does the same.
C er 1 PYTHON PANDAS
O
7
EXAMPLE \'\ rift a progrm 1 '" rrr.'ilfc a S,•1 fr,, ol1jccf 11~i11g ,111 11darmy
that /ins 5 cle111r11b III the range ~4 to 64. Output
SOLUTION 0 24.0
1 34.0
import pandas as pd 2 44.0
s6 = pd.Series (np.linspace(24 , 64, s)) 3 54.0
print(s6) 4 64.0
dtype: fl oat64
EXAMPLE Write a program to create a Series object using an ndarray that is created by tiling a list [3,5] ,
trcice).
Output
SOLUTION
import pandas as pd 0 3
1 5
s7 = pd.Series (np.tile([3,S],2 ))
2 3
print(s7)
3 5
dtype: i nt32
Feb'
- ...............
28 , '",H : 31
.
Here, one thing is noteworthy that if you are creating a Series object from a dictionary object,
then keys of the dictionary become index of the Series and the values of dictionary become the
data of Series object. Also, the indexes, which are created from keys may not be in the same
order as you have typed them.
EXAM PLL • Write a program to create a Series object using a dictionary that stores tlze number of students in
each section of class 12 in your school.
Output
SOLUTION
import pandas as pd A 39
B 41
stu = { 'A ' :39, 'B' :41, 'C' :42, 'D': 44}
C 42
s8 = pd. Series ( stu) D 44
print(s8) dtype: int64
INFORMATICS PRACTICES
i------
1>>> medalsWon >>>med als2 >>> ser2
I
I
:0 10 1 15 Indor~ Yet to start
I
1 dtype: int64 3 15 Delhi Yet to start
I
1, _ _ll'lllil'IIM~' IIMIIIMll'l ll 5 15 Shimla Yet to start
dtype: int64 dtype: object
EXAMPLE I Write a program to create a Series object that stores the initial budget allocated (50000/- each) for
the four quarters of the year : Qtrl, Qtr2, Qtr3 and Qtr4. Output
SOLUTION
Qtrl 50000
import pandas as pd Qtr2 50000
s9=pd. Series( S0000, index= ('Qtrl' , 'Qtr2', 'Qtr3', 'Qtr4' ]) Qtr3 50000
Qtr4 50000
print(s 9)
dtype: int64
EXAMPLE lid Total number of medals to be won is 200 in the Inter University games held every alternate year.
9.
Write code to create a Series object that stores these medals for games to be held in the decade 2020-202
SOLUTION Output
2020 200
import pandas as pd 200
2022
s10 = pd.Seri es{200, index= range{2 020,202 9,2)) 2024 200
print(s18) 2026 200
2028 200
dtype: int64
1.4.2 Creating Series Objects - Additional Functionality
about
Now that you have a fair idea about how to create Series type objects, let us talk
addition al functionality of Series() that you can use to create Pandas Series objects.
NOT E
Both data and index have to be
sequences ; None is taken by default, In place of pandas.Series( ), you may use pd.Series( ) also if
you
impor t pandas as pd] or with
if you skip these parameters. have imported pandas as pd [i.e.,
you
the name that you have used with import statement e.g., if
use
Consider following example have given statement as import pandas as pnd then you can
pnd.Series() in place of panda s.Series()
statements :
( A sequence of numbers
>>> arr = (31, 28, 31, 30]
>>> mon = [ 1
r , A < A sequence of strings
»> obj3
Jan 31
Series object created with above statement
Feb 28
Mar 31
Apr 30
dtype : int64
35], inde x= [ 14 ', ' '] )
>>>o bj4 = pd.Se ries( data = (32, 34,
>»o bj4
Another Series object created with both values and
A 32
indexes given as sequences
B 34
C 35
dtype : int64 datatype chosen by default
You may use loop for defining index sequence also, e.g.,
in 'abcde'])
sl:11 :pd.S eries (rang e(l, 15, 3), inde x= [x for x
INFORMATICS PRACTICE
The abO\ e code " ill crea ll' a . _eril'S object as shm,11 below :
Caunon !
If specifying indexes explicitly using an index sequence, you must provide indexes equal to the number of
-.aiues in data array; providing too few or too many indices will lead to an error - the ValueError.
~M PLE A Python list namely section stores the section nnmes ('A', 'B', 'C', 'D') of class 12 in your school.
Anot~ st contri stores the contribution made by these students to n charity fund endorsed by the school. Write
code to create a Series object that stores the contribution amount as the values and the section names as the indexes.
SOLUTION
Output
import pandas as pd
section = [ 'A' , ' B' , 'C' , , D, ] A 6700
contri = [6700, 5600, 5000, 5200] B 5600
sll = pd .Series(d ata = contri, index= section) C 5000
print(s1 1) D 5200
dtype: int64
<Seri es Obj ect>= pandas.S eries(da ta = None, index= None, dtype = None )
None is the default value for different parameters taken in case no value is provided for a
parameter. If you do not specify datatype, the nearest datatype to store the given values will be
taken. But you can specify your own datatype by specifying a NumPy datatype with dtype
attribute.
Series object's indexes are not necessaril y
Consider following statements : to O to n -1 always.
>>> obj S = pd.Ser ies (data = arr, index= mon, dtype = np . float64) ·-···
>>> objS
9 18
See, the index 10 20
values taken from 11 22
NumPy array a 12 24
dt ype : int 32
9 81
10 100
11 121
12 144
dtype: i nt32
The vectorized operations on a NumPy array (e.g., a * 2 or a **2 ) will be applied on every
element of NumPy array and stored as data part of Series object. Now consider another code:
>>> obj 8 = pd.Series( data= (2 * Lst) ) Notice this time the expression for
>>> Obj8
••♦•
dqt;) qrrqy involves q list qnd see how
1t hqs impqcted the hnql dqtq qrrqy
0
1
2
3
9
10
11
12
<fi!:j2 ••
4
5
6
9
10
11
•·············
7 12
The above code tries to create data array for Series object obj8 by giving expression 2 * Lst where
Lst is a Py thon list and as you know a number multiplied with a list replicates the list those
many times and hence the data array has the same list values replicated twice (': 2 * Lst).
12 INFORMATICS PRACTICES - XJ
Carefully read the follm, ing example. It is similar yet different from the previous example.
EXAMP Sequrncc, section and co11tri1 store the section names ('A', 'B', 'C', 'D', 'E') and contribution
made by them rc.;pcctfr1dy (6-00, 5b00, 5000. 5200, 11;/) for a charity. Your school has decided to donate as much
co11tributio11 a., made fy each .::cction, i.e., the do11atio11 will be dou[1/cd.
Write rode to crt'afc a Sou·-: object that stores tlzt• co11 tributio 11 amount as the values and the section names as the
index~ with datatype as float32.
SOLUTION
import pandas as pd
import numpy as np This time we have stored the contribution amount in an
nda"ay contrit so that we can double its value through
section= ['A', 'B', 'c', •D•, •E, J ~ vectorized operation to add school's contribution
·
· tegers, an mteger
Smee
. J ffi .being stored are m
all thebvalues datatype is chosen by Pandas (int64 here), so that all
va ues can e e mently stored and processed.
All values are of int64 types _ hence homog eneous datatype.
2. Creating a Series ob1ect that stores th ree fl oa t·ing type va Iues
Since all the values being stored are floating point numbers, a floating-point datatype is chosen by Pandas
(float64 here), so that all values can be efficiently stored and processed.
All values are of float64 types - hence homogeneous datatype.
3. Creating a Serres 061ect that stores a mix of integer and floating type values
Since all the values being stored are numbers - some integers and some floating point numbers, a floating-
point datatype is chosen by Pandas (float64 here), so that all-values can be efficiently stored and processed.
All values are of float64 types - hence homogeneous datatype.
4. Creating a Series object that stores a mix of different type values
Since all the values being stored have different datatypes (integers, floating point numbers, string etc.),
Pandas will select a datatype, which is cap~ble of holding all ~ese values. Hence, Pandas will choose its
datatype as object, which is capable of holding any type of value.
All values are of object types - hence homogeneous datatype.
14
For the rest of the attributes (covered in points below), we shall be using followin g two objects
(given as Reference 1.2)
>»obj2= pd.Serie s( [ 3.5, 5., 6.5, 8.]) »> obj3 = pd.Series( [6.5, np.NaN, 2.34] )
>» obj2 >» obj3
0 3.5 0 6.50
1 5.0 1 NaN
2 6.5 2 2.34
3 8.0 dtype: float64
dtype: float64 Reference 1.2
PYT
15
b Setfng 1he Index Nome
B, default Senc, ha, no name for ·its • d
string to its <Series obJ'ecb ind - m exe._
.~, but ,·ou can se1 its
J
· • d
m ex name by ass1gnmg
• . a
· ex.name .1ttnbute, l' \_\l' .,
»>serl=pd.Series( [1, 2, 3], index=('a', 'b', 'c'])
»> serl
a 1
b 2
C 3
>~ > serl. index. name = "newind"
»> serl ••
newind C -J •••••
a 1 -~ ~··············
b 2 ~ See. new name for
the indexes assigned
C 3
>» obj2.ndim
Series object is 1-dimensional object.
1
»> print (obj2.size, obj3.size) obj2 has 4 elements and obj3 has 3 elements
4 3
»> print ( obj2. nbytes, obj3. nbytes) obj2 has 4 elements. hence 4*8 = .J2 bytes and objJ
32 24 +---- - - -- - - has 3 elements, hence 3 * 8 = 24 bytes
(, (g) Checking Emptiness (empty attribute) and Presence of NaNs (hasnans attribute)
See we have created an empty Series object namely objl also and then checked emptiness of
objl along with emptiness of objects obj2 and obj3 of Reference 1.2. (given on previous page).
f,XAMPLE Ctn1,ider Ihe two .. n , o t.'t" ~ ._J1 ,md s12 tli 11 Y<lll rr,·alt'd ;11 t'rnm;1/e,; 11 and 12 respectively.
Print tht attrr utc, o{l oth thN d l ct~ 111 a rt' orl fim 11 n, ,/io,l'/1 /•c/i,ll' .
SOLUTION
impcrt pandas as pd
# statements here to c,~eate cbjects sll and s12 ·from previous examples
print("Attribute name \t\t Object sll \t Object s12 ")
print("--- \t \ t - - \ t - - ")
print ("Data type ( . dtype) :\t", s11.dtype, '\t\t', s12.dtype)
Xprint("Shape (.shape) : \t", ~11.shape, '\t\t', s12.shape)
print("No. of bytes ( .nbytes) :\t", s11.nbytes, '\t\t', s12.nbytes)
print( "No. of dimensions( .ndim) : \t", s11.ndim, '\t\t', s12,ndim)
)<print("Item size (. itemsize) : \t", s11.itemsize, '\t\t', s12.itemsize)
print("Has NaNs? ( .hasnans) : \t", s11.hasnans, '\t\t', s12.hasnans)
print("Empty? (.empty) :\t", s11.empty, '\t\t', s12.empty)
Output
m mu n w,l} s. You
111 1n"l'Sc; ,t
dcm cnts and sh
Ol'l'('c;c; md1v1dual
9 81
18
20
10 100
11 121
22
24 p2 144
e: int3 2 dtyp e : int3 2
"1o t------•-•- 8
>o J7 9
on c cs r
s
valid or legal inde xes (i.e., whi ch exist in serie
As) ou sec in above figure, we have use only
object) to access an clement.
ng an index with the Series object will retu
If the Series object has duplicate indexes, then givi
w:
all the entr ies with that index, 1:.g., see belo
With duplicate indexes in
b ob3[
41. 75
lI a Series object, all
entries with the same
index are returned.
b 12.5 8 b
a 22. 25 dtyp._e:
__,,.., ....,. ____
__floa t64 ~
a 32.0 0
b 41. 75 2 is not a legal index forobj7 ( its
dtyp e: floa t64
_,_..r rnr =••= ==
legal indexes are 9 to 12- refer
above ), hence the error.
------------
all last' :
BUT if you try to give an inde x File , line , in c module>
whi ch is not a legal index for a
••..
I obj7[ 2)
I
Series object, it will give you
I
I
'
I
. px1", line 817, i
an erro r. : File "p~n das/_ libs/ hash table _clas s_he lper
table .Int6 4Has hTab le.ge t_ite m
1 pandas. _l1bs .hash
See the adja cent figure. .
KeyError: 2
a,apter J • PYTHON PANDAS
19
l .5.2 Extracting Slices from Senes Object
Like other sequences. n~u c.m e,tract slice too trom a &>rics object to retrieve subsets. Let us see
how you can extract slices trom Series objects. Herc, You need to underst and an importa
thing about slicing. which is that : nt
·
Shcmg rakes place pos1t1011 \\•isc and nor chc inde, wi,;c ,n a series ob1ect.
To underst and this, let us consider the same Series objects as given in Fig. 1.3. Intemally there is a
rosition associated with element - first element gets the position as o, second element gets
the
Position as 1 and so on. Irrespective of their indexes, positions always start with Oand go on
like 1, 2, 3 and so on. (see Fig. 1.4)
obj5
Index Data
obJ6 obj7
Position Position Index Data Index Daa
t
e Feb 28
Position
1 Jan 31
e 0 11 0 9 18
2 Mar 1 1 14 10 20
31 1
2 2 17
2 11 22
3 3 20
3 12 24
4 4 23
( All individual elements have position numbers starting from oonwards i.e., Ofor first element, 1 for 2nd element and so on J
Figure 1.4 Position number associated with each element of Series object
When you have to extract slices, then you need to specify slices as [ start : end : step ] like you
do
for other sequences, but the start and stop signify the positions of elements not the indexes
.
Consider following examples :
Jan 31 ,2 17
'3 20
9 18
Mar 31 11 22
dtype : i nt64 '4 23
dtype: int32
dtype: i nt32
Irrespective of the indexes, the slices have been extracted posffion wise.
Co 1-ide.. a Serzi;'j obicct s8 that !:itOrt'., the 1111 11l1er of students in rnclz section°!class 12 (as shown
1
l lol
A 39
B ~1
C 42
D 44
First two ,ection-- have bct?n giren a task ~f selling tickets @ 1001- per ticket as Output
' 1 • (If a ~,xial experimo, f l Vrite code to displav how much the1/ have collected.
A Series is a one-dimensional structure which offers flexibility with storage as well as operations
on it. Let us now talk about how you can perform various types of operations on Pandas Series
objects.
rlbove ass1grnnenc will replace all the values falling in given slice Consider follo-\'ving screenshots:
>>>obl
»E:) •
>>::: obl ...• >>::: obl
••••
11.· ••
0 1.50 0 ~
1. 85 9 " " - 0 1.85 I
1
2
12.75
24.00
1
2
12 . 75 t ' ~
24 . 00
1
2
12. 15
-15.75
rJ;
3 35.25
4 46.50
3 35.25 3 · 15 . 75 ~
J,
dtype: float64
4 46,50 4 46.50 ~
dtype: f l oat64 dt ype : float64
•• = =
-
>>>ob2
= 38;;>--
>>>ob2 .,,
~
., ,
$
a 1.50 , ,
b 12.75 a l.S j,:. /
C 24 . 00 b 380.0 ,
d 35 .25 C 24.0 ~
e
dtype
46 .50
: float64
•aha.ls •. t
d
e
380.0
- --
46 . 5
dtype : float64 ____ ..,
21
e dexes
o rename moews nt a &'11p.., Qbjl'd b, ,1---1gning ,ww index ,lfr,1y lo its
: <Ob ect> mde, = <tw" mde.\ .1rr.1, > ,\ ..,Cl' bekn,· :
'
l
l :•wx R ~-0
d I
See how the new
e 24 . 0 >.
mdexes get a;;~,gned m 1Y 380 . 0 ~
dt
the oroer as gNen m the !z 46. 5 £~:"'!""'._. ._ _ __
new index array I dtype: float64 N O TE
"'-rr..a.r..N'•w••~"'WI.ArJ'..1'
The :,ize of new index array must match" ' 1th existing index Please note that Series obJect's
values can be modified out size
array' ... size. In other \\ ords, you cannot change the size of a
cannot. So you can say that ~erie;;
Sene::- object by .!:,Signing more or less number of indexes objem are value-mutable but
(:,ee below) :
.. ....··Ii~
size-immutable objects.
···Ii••-···· ~
·-·-···-·-·-·-·-♦•••
;>;>obi ♦• >>> ob2.index= [ -
J I ") ]
i ♦ Traceback t- :
I
''
jV 1.5
iw 380.0
!x 24.0 File "C:\ProgramData\Anaconda3\lib\site·packages\pa ndas\cor e
I
380.0 \interna ls.py", 1 e 3074, i set_axis
jY
jz 46.5 (old_len , new_len))
!dtype: float64 ,
-.i:ri.•.at-•--••R .ii'i,,&1-,i wJ ValueEr ror · Length mismatch: Expected axis has 5 element s, new values have 7
elements ~ us .., ==-· ,....,, ,,,,J
EXAMPLE Consider the Series ol1_ject s13 that stores the contribution of each section, as shown below :
A 6700
B 5600
C 5000
D 5200 Output
Write code to modiftJ the amount of section 'A' as series object after modify, ng amounts:
A 7600
7600 a•1d for sections 'C' and 'D' as 7000. Pnnt the
B 5600
changed ab1ect.
C 7000
SOLUTION D 7000
import pandas as pd
sl3[0] == 7600 -tr to modify section 'A' 's amount
s13[2: J =7000 Jt to 11cdif• sect ior c; 'C' and 'D' 's amount
<pa s obJect>.head([n])
Or <pa dasobJect>.tail([n])
It ' ou do not pro, ide .im , .,lul' for 11 tlwn llt'ati() and tail() will return first 5 and last 5
n.~ pecth elv of .1 Panda~ t,bjt'.'ct. Consider below given screenshots :
-
'>>>Ob7 .head( 7 ) >>> ob7 . t ail ( ) >>>ob7. tail(7}
I 4
2.75 t 6 31.85 I 22. 15
2.75 0 2 .75 I
7.60 I 7 36.70 :5 27 . 00
7.60 1 7.60 I ,
12.45 8 41.55 I 0 31. 85
12 . .!5 2 12.45
17.30 9 46.40 :7 36 .70
17.30 3 17.30
22.15 10 51 . 25 8 41. 55
22.15 4 22 . 15
27.00 dt ype: float 64 9 46.40
5 27 .00
31.85 10 51. 25
6 31. 85
36.70 I dtype : fl oat 64
1 dtype: float 64
41.55
46.40
51.25
dtypo?: float64
I L~ l
EXAMPLE A Series object trdata consists ofaround 2500 rows of data. Write a program to print the following
details :
(i) First 100 rows o•''J data (ii) Last 5 rows of data
SOLUTION
import pandas as pd
# trdata object's creation or loading happens here
print(trdata.head(100))
print(trdata . tail())
n---- QR Code
.se
38.2S
2.00
d 2. 2500
105.7S
e
162.5625
n9.50
576.0000
dt) pe. float64
..........ili.'il'iiziafi1111 1242.5625
e 2162. 2500
See each of e expresslOOS though applied on the Senes type obJect but is carried out on dtype: float64
cacti individual i+.em of the Senes ObJect - Vector Operations.
I
>>>obl >>>ob2
- . . . . . _,. __ ........-.-JIii
>>>ob3 >>>ob4 ----- >>>obs
Scan
QR Code
24
4:; 4
2
3
- 350.4375
-584.00{n)
I
:4
2
3
-0.707865
-0.492188
1.113772
33.805
49.330
64 . 855
4 1941.3756
dt pe . dt } pe: float64
I
, dtype: floc1t64
~ A iililtl,l"";a •a1flXJ,,V:.,l)#iMCA.,.. ,_
Smee ob'~s 001 and oil3 have matching indexes (both have indexes in the range Oto 4), it successfully carries out
g~-er. an:t:me ~ ope·aoon on corresponding items of matching indexes, i.e., items with index Oof both the objects are
?erlonned the 9"en ~ration and result given for the index o. similarly corresponding items having index 1 of both the
objects are pertorrned the given operation and result given for index 1. and so on.
Same thing is applied for expression ob2 + ob5, ,.e., cor;esponding values of index 'a' are added, similarly
corresponding values of index 'a' are added, and so on.
But if you try to perform operation on objects that have some or all non-matching indexes, then
it will add values of matching indexes, if any and for non-matching indexes of both the objects,
it will return me result as Not a Number i.e., NaN. NaN represents missing data. (see below)
.------ >>>obl + ob4
---,.
>>>obl +ob2
'
.~ 3.105
I Computed the given None of the indexes
0 NaN
1 18.280
-5.S45
I operation for matching
indexes (0, 1, 2, 3, 4 in
'1
2
NaN
fJafJ
matched, because
(0, 1, 2, 3, 4) ofob1
-1. 670
both objects) and NaN ·3 tJaN
do not match with
64.855 for non-matching 4 NaN indexes ('a', 'b', 'c',
tJaN indexes (5, 6, 7 of ob4) ,a NaN 'd', 'e') of ob 2, hence
r~aN b NaM NaN for all the
.c NaN indexes of both the
id NaN objects
~e Na N
dtype: float64
'
Figure 1.6 (b)
NOTE
When you perform arithmetic operatior.s on two Series type obJects, the data rs
aligned on the basis of matching indexes (this is called Data Alignment in Pandas
objects) and then performed arithmetic; for non-overlapping indexes, the
arithmetic operations result as a NaN (Not a Number).
You can store the result of object arithmetic in another object. which will also be a Series objec
i.e., if you give :
»> ob6 = obl + ob3
Then ob6 will also be a Series object (if obl and ob3 are Pandas Series objects).
PYTHON PANDAS 25
i'.: uml,cr vf ,t1tdc11f..., :'11 c/11,scs 11 1111d 1~ i11 three sl ream:.. ('Sc1c11cc', 'Commerce' and 'Humanities')
ll'l Smc, t Nech c11 a11c; 12. i \rite codt? tofi11d total 1111111b,·r of !:>f11tlt'11I~ 111 cla:,se:, 11 and
1 12, stream wise.
SOLUTION
import pandas as pd
: # cr"::il..,_ ... Ser es objo:>cts
ell =pd• Series ( data =[30, 40, 50], index = ['Science ·, •Commerce', 'Humani ties' ] )
c12 "'"pd. Series ( data = [ 37, 44, 45], index= [ 'Science ', 'Commerce', 'Humani ties'])
;; ad ..... "'g t o ohjecl.s to get total no. of students Output
print("T otal no. of students ")
Total no. of students
print(c11 +c12) :4 series objects arithmet ic
Science 67
commerce 84
Humaniti es 95
dtype: int64
/
EXAM PL • Objectl Population stores the details of populatwn in four metro cities of India and Object2
Avglncome stores the total average income reported in previous year in each of these metros. Calculate income per
capita for each of these metro cities. Statement continuation mark. Do not type it while
typing code in a .py file, rather type the whole
SOLUTION statement in single line
import pandas as pd .
Populatio n =pd. Series ( (10927986, 12691836, 4631392, 4328063 ] , \
index = ( 'Delhi ' , 'Mumbai', 'Koikata' , 'Chennai' ] )
/