You are on page 1of 18

NumPy Arrays and Pandas Series object

SAMPLE CODES

Introducing Python Pandas

Pandas or Python Pandas is Python’s library for data analysis. Pandas have derived its name from “panel data
system” which refers to multidimensional, structured data sets. The main author of Pandas is Wes McKinney.

Data Analysis: It refers to process of evaluating big data sets using analytical and statistical tools so as to
discover useful information and conclusions to support business decision-making.

NumPy Arrays: Numerical Python or Numeric Python is an open source module of Python that offers
functions and routines for fast mathematical computation on arrays and matrices. An array refers to a named
group of homogeneous (of same type) elements.
NumPy Arrays come in two forms:-
 1-D arrays known as Vectors having single row/column only.
 Multidimensional arrays known as Matrices having multiple rows and columns.

A Simple code to illustrate how a NumPy array can be created.


import numpy as np
l1=[1,2,3]
arr1=np.array(l1)
print("A 1-D array is")
print(arr1)
arr2=np.array([[1,2,3],[4,5,6],[7,8,9]])
print("A 2-D array")
print(arr2)

Output:-
A 1-D array is
[1 2 3]
A 2-D array
[[1 2 3]
[4 5 6]
[7 8 9]]

Note:-The following section illustrates various topics related NumPy arrays

Notes on Python by Alap Mukherjee Page 1


S.No Sample Codes and Illustrations Output
Ways to Create NumPy Arrays
1. Creating empty arrays using empty() float64

import numpy as np int8


arr1=np.empty([3,2])
float64
arr2=np.empty([3,2],dtype=np.int8) int8
print(arr1.dtype) [[2.41907520e-312 2.33419537e-312]
print(arr2.dtype) [6.79038654e-313 2.12199579e-312]
print(arr1) [8.70018275e-313 6.23040373e-307]]

2. Creating arrays filled with zeros using zeros() [[0. 0.]


[0. 0.]
import numpy as np [0. 0.]]
arr3=np.zeros([3,2]) [[0 0]
print(arr3) [0 0]
arr3=np.zeros([3,2], dtype=np.int8) [0 0]]
print(arr3)
3. Creating arrays filled with ones() [[1 1]
[1 1]
import numpy as np [1 1]]
arr4=np.ones([3,2], dtype=np.int8)
print(arr4)
4. import numpy as np This is arr1
arr1=np.zeros([3,3], dtype=np.int8) [[0 0 0]
print("This is arr1\n",arr1) [0 0 0]
arr2=np.ones_like(arr1) [0 0 0]]
print("Converted array\n",arr2) Converted array
[[1 1 1]
[1 1 1]
[1 1 1]]

5. import numpy as np The required array are:


arr1=np.arange(1,11,1, dtype=np.int8) [ 1 2 3 4 5 6 7 8 9 10] [1 3 5 7 9]
arr2=np.arange(1,11,2, dtype=np.int8)
print("The required array are:\n",arr1,arr2)
6. import numpy as np [2. 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 3. ]
arr1=np.linspace(2,3,11)
print(arr1)
PANDAS BASICS
7. import numpy as np 0 0
import pandas as pd 1 1
a=pd.Series(range(5)) 2 2
print(a) 3 3
b=pd.Series([1,2.5,5.6,3]) 4 4
print(b) dtype: int64

Notes on Python by Alap Mukherjee Page 2


0 1.0
1 2.5
2 5.6
3 3.0
dtype: float64

An array= [1 3 5 7 9]
import numpy as np 0 1
import pandas as pd 1 3
a=np.arange(1,11,2) 2 5
8.
print("An array=",a) 3 7
s=pd.Series(a) 4 9
print(s) dtype: int32

import numpy as np 101 anant


import pandas as pd 102 Devendra
9. a=pd.Series({101:'anant',102:'Devendra',103:'Rish 103 Rishabh
abh'}) dtype: object
print(a)
0 5
1 5
2 5
3 5
dtype: int64
a=pd.Series(5,index=range(4)) 1 5
print(a) dtype: int64
b=pd.Series(5,index=range(1,2)) 1 5
print(b) 3 5
10. c=pd.Series(5,index=range(1,10,2)) 5 5
print(c) 7 5
d=pd.Series('Good 9 5
Morning',index=['anant','devendra','Rishabh']) dtype: int64
print(d) anant Good Morning
devendra Good Morning
Rishabh Good Morning
dtype: object

#Within pandas, a missing value is denoted


0 6.0
#by NaN .
1 5.0
11. 2 NaN
a=pd.Series([6,5,np.nan])
dtype: float64
print(a)

12. rollno=[1,2,3] 1 manish


name=['manish','harish','anurag'] 2 harish

Notes on Python by Alap Mukherjee Page 3


3 anurag
dtype: object
1 a
a=pd.Series(data=name,index=rollno) 2 e
print(a) 3 i
b=pd.Series(data=['a','e','i','o','u'],index=[1,2,3,4,5] 4 o
) 5 u
print(b) dtype: object
c=pd.Series(data=['a','b','c','d'],index=[i for i in 1 a
'1234']) 2 b
print(c) 3 c
4 d
dtype: object

Jan 31
Feb 28
Mar 31
b=pd.Series(data=[31,28,31,30],index=['Jan','Feb','
Apr 30
Mar','Apr'])
dtype: int64
print(b)
13. Jan 31.0
c=pd.Series(data=[31,28,31,30],index=['Jan','Feb','
Feb 28.0
Mar','Apr'],dtype=np.float64)
Mar 31.0
print(c)
Apr 30.0
dtype: float64

[1 2 3 4]
1 2
2 4
3 6
a=np.arange(1,5) 4 8
print(a) dtype: int32
b=pd.Series(index=a, data=a*2) 1 1
14.
print(b) 2 4
c=pd.Series(index=a, data=a**2) 3 9
print(c) 4 16
dtype: int32

15. a=[1,2,3,4] 0 1
b=pd.Series(data=a*2) 1 2
print(b) 2 3
3 4
4 1
5 2
6 3

Notes on Python by Alap Mukherjee Page 4


7 4
dtype: int64

SERIES OBJECT ATTRIBUTES


16. import pandas as pd
import numpy as np
obj3=pd.Series([1,2,3,4,5])
print(obj3)
print(type(obj3))
print(obj3.values)
print(obj3.index)
print(len(obj3))
print(obj3.itemsize)
print(obj3.shape)
print(obj3.ndim)
print(obj3.size)
print(obj3.nbytes) #obj3 has 5 elements, hence 5*8=40 bytes
print(obj3.empty)
print(obj3.hasnans) #to check whether obj3 has some NaN value or not
obj4=pd.Series([6,7,np.nan])
print(obj4)
print(len(obj4))
print(obj4.count()) # the result is the number of non NaN values
Output:-
0 1
1 2
2 3
3 4
4 5
dtype: int64
<class 'pandas.core.series.Series'>
[1 2 3 4 5]
RangeIndex(start=0, stop=5, step=1)
5
8
(5,)
1
5
40
False
False
0 6.0
1 7.0
2 NaN
dtype: float64
3

Notes on Python by Alap Mukherjee Page 5


2

#we have created empty series object


obj4=pd.Series()
print(obj4.empty)

17. Output:-
True

ACCESSING INDIVIDUAL ELEMENTS


18. obj1=pd.Series(data=[31,28,31,30],index=['Jan','Feb','Mar','Apr'])
print(obj1)
print(obj1['Jan'])
obj2=pd.Series([1,2,3,4,5,6])
print(obj2)
print(obj2[3])
obj3=pd.Series(index=[1,2,3,2,4],data=['a','e','i','o','u'])
print(obj3)
print(obj3[2])

Output:-

Jan 31
Feb 28
Mar 31
Apr 30
dtype: int64
31
0 1
1 2
2 3
3 4
4 5
5 6
dtype: int64
4
1 a
2 e
3 i
2 o
4 u
dtype: object
2 e

Notes on Python by Alap Mukherjee Page 6


2 o
dtype: object

EXTRACTING SLICES FROM SERIES OBJECT


import numpy as np
import pandas as pd
obj1=pd.Series(index=[1,2,3,4,5], data=[10,11,12,13,14])
print(obj1)
print(obj1[1:])
print(obj1[1:3])
print(obj1[::2])

Output:-
1 10
2 11
3 12
4 13
5 14
19
dtype: int64
2 11
3 12
4 13
5 14
dtype: int64
2 11
3 12
dtype: int64
1 10
3 12
5 14
dtype: int64

20 #Modifying elements of Series object

obj1=pd.Series(index=[1,2,3,4,5], data=[10,11,12,13,14])
print(obj1)
obj1[2]=5 #using index
print(obj1)
obj1[1:3]=99 #using slicing
print(obj1)

Output:
1 10
2 11

Notes on Python by Alap Mukherjee Page 7


3 12
4 13
5 14
dtype: int64
1 10
2 5
3 12
4 13
5 14
dtype: int64
1 10
2 99
3 99
4 13
5 14
dtype: int64

21 #implementing head and tail functions

obj1=pd.Series(index=[1,2,3,4,5,6,7,8,9,10], data=[10,11,12,13,14,15,16,17,18,19])
print(obj1)
print(obj1.head()) #returns first five results by default
print(obj1.tail()) #returns last five results by default
print(obj1.head(4))
print(obj1.tail(3))

Output:
1 10
2 11
3 12
4 13
5 14
6 15
7 16
8 17
9 18
10 19
dtype: int64
1 10
2 11
3 12
4 13
5 14
dtype: int64
6 15
7 16

Notes on Python by Alap Mukherjee Page 8


8 17
9 18
10 19
dtype: int64
1 10
2 11
3 12
4 13
dtype: int64
8 17
9 18
10 19
dtype: int64

22 #vector operations on Series object


obj1=pd.Series(index=['a','b','c','d'], data=[1,2,3,4])
print(obj1)
print(obj1+2)
print(obj1*2)
print(obj1**2)
print(obj1>3)

Output:
a 1
b 2
c 3
d 4
dtype: int64
a 3
b 4
c 5
d 6
dtype: int64
a 2
b 4
c 6
d 8
dtype: int64
a 1
b 4
c 9
d 16
dtype: int64
a False
b False
c False

Notes on Python by Alap Mukherjee Page 9


d True
dtype: bool

#Arithmetic operations on Series Object


obj1=pd.Series(index=['a','b','c','d'], data=[1,2,3,4])
obj2=pd.Series(index=[1,2,3,4], data=[10,11,12,13])
obj3=pd.Series(index=['a','b','c','d'], data=[5,6,7,8])
obj4=pd.Series(index=[1,2,3,4], data=[1,3,5,7])
print(obj1+obj3) #matching indexes found
print(obj2+obj4) #matching indexes found
print(obj1+obj2) #no matching indexes found hence NaN result returned
obj5=pd.Series(index=[1,2,3,4,5,6,7,8], data=[2,4,6,8,11,12,13,14])
print(obj4+obj5)

Output:
a 6
b 8
c 10
d 12
dtype: int64
1 11
2 14
3 17
23 4 20
dtype: int64
a NaN
b NaN
c NaN
d NaN
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
1 3.0
2 7.0
3 11.0
4 15.0
5 NaN
6 NaN
7 NaN
8 NaN
dtype: float64

24 #Filtering entries
obj1=pd.Series(index=[1,2,3,4,5], data=[6,7,8,9,10])

Notes on Python by Alap Mukherjee Page 10


print(obj1)
print(obj1>8)
print(obj1[obj1>8]) #only data values greater than 8 are shown

Output:
1 6
2 7
3 8
4 9
5 10
dtype: int64
1 False
2 False
3 False
4 True
5 True
dtype: bool
4 9
5 10
dtype: int64
#Reindexing Series object
obj1=pd.Series(index=[1,2,3,4],data=['a','b','c','d'])
print(obj1)
obj2=obj1.reindex([1,3,2,4])
print(obj2)

Output:
1 a
25 2 b
3 c
4 d
dtype: object
1 a
3 c
2 b
4 d
dtype: object
#Dropping entries from an Axis
obj1=pd.Series(index=[1,2,3,4],data=['a','b','c','d'])
print(obj1)
obj1=obj1.drop(2) #data value corresponding to index 2 is removed
26 print(obj1)

Output:
1 a
2 b

Notes on Python by Alap Mukherjee Page 11


3 c
4 d
dtype: object
1 a
3 c
4 d
dtype: object

More on NumPy Arrays


#create ndarrays using fromiter()
#fromiter() will create an ndarray that will have the keys of dictionary dict1 as its
elements
import numpy as np
import pandas as pd
dict1={1:'a',2:'b',3:'c',4:'d'}
print(dict1)
arr1=np.fromiter(dict1,dtype=np.int32)
27
print(arr1)
arr1=np.fromiter(dict1,dtype=np.float32)
print(arr1)

Output:
{1: 'a', 2: 'b', 3: 'c', 4: 'd'}
[1 2 3 4]
[1. 2. 3. 4.]

#create ndarray from individual letters of a string using fromiter()


#U2 means each element of ndarray can have length of 2 Unicode characters

a="JaipurIsCapitalOfRajasthan"
b=np.fromiter(a,dtype="U2")
28 print(b)

Output:

['J' 'a' 'i' 'p' 'u' 'r' 'I' 's' 'C' 'a' 'p' 'i' 't' 'a' 'l' 'O' 'f' 'R' 'a'
'j' 'a' 's' 't' 'h' 'a' 'n']

29 #using fromiter() in various iterable forms


l1=[1,2,3,4]
t1=(1.2,2.5,3.1)
l2=[a*2-3 for a in range (2,5)]
a1=np.fromiter(l1,dtype=np.int32)
print(a1)
a2=np.fromiter(t1,dtype=np.float32)

Notes on Python by Alap Mukherjee Page 12


print(a2)
a3=np.fromiter(l2,dtype=np.int32)
print(a3)

Output:
[1 2 3 4]
[1.2 2.5 3.1]
[1 3 5]

#picking a smaller set of elements


#count=5 means that only first 5 character will be picked and ndarray is formed
using only those elements
a="JaipurIsCapitalOfRajasthan"
30 b=np.fromiter(a,dtype="U2",count=5)
print(b)

Output:
['J' 'a' 'i' 'p' 'u']

#creating 2D ndarray using arange and using reshape()


a=np.arange(10)
print(a)
b=a.reshape(2,5)
print(b)
c=a.reshape(5,2)
print(c)

31
Output:
[0 1 2 3 4 5 6 7 8 9]
[[0 1 2 3 4]
[5 6 7 8 9]]
[[0 1]
[2 3]
[4 5]
[6 7]
[8 9]]

32 #slices in 2d array
import numpy as np
import pandas as pd
a=np.array([[1,2,3,4,5],[2,5,6,1,3],[6,7,8,9,1],[9,7,5,2,4]])
print(a)
slc1=a[0:3,0:4]
print(slc1)
slc2=a[:3,3:]
print(slc2)

Notes on Python by Alap Mukherjee Page 13


slc3=a[1::2,:3]
print(slc3)

Output:-

[[1 2 3 4 5]
[2 5 6 1 3]
[6 7 8 9 1]
[9 7 5 2 4]]
[[1 2 3 4]
[2 5 6 1]
[6 7 8 9]]
[[4 5]
[1 3]
[9 1]]
[[2 5 6]
[9 7 5]]

#joining or concatenating NumPy Arrays


#combining two 1D arrays horizontally or vertically
import numpy as np
a=np.array([1,2,3,4])
b=np.array([5,6,7,8])
c=np.hstack((a,b))
print(c)
d=np.vstack((a,b))
33 print(d)

Output:-
[1 2 3 4 5 6 7 8]
[[1 2 3 4]
[5 6 7 8]]

34 #joining 2D arrays using hstack and vstack


a=np.array([[1,2,3],[4,5,6]])
b=np.array([[7,8,9],[10,11,12]])
c=np.hstack((a,b))
print(c)
d=np.vstack((a,b))
print(d)

Output:-

[[ 1 2 3 7 8 9]
[ 4 5 6 10 11 12]]

Notes on Python by Alap Mukherjee Page 14


[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]]

#combining existing arrays using concatenate()

arr1=np.array([[1,2,3],[4,5,6],[7,8,9]])
print(arr1)
arr2=np.array([[10,11,12],[13,14,15]])
print(arr2)
arr3=np.concatenate((arr1,arr2))
print("Array after concatenation")
print(arr3)
arr4=np.array([[1,2],[3,4]])
arr5=np.array([[1,2,3],[4,5,6]])
print("Array after concatenation")
arr6=np.concatenate((arr4,arr5),axis=1) #shape of both arrays match on rows dimension
print(arr6)
35
Output:-
[[1 2 3]
[4 5 6]
[7 8 9]]
[[10 11 12]
[13 14 15]]
Array after concatenation
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]
[13 14 15]]
Array after concatenation
[[1 2 1 2 3]
[3 4 4 5 6]]

Combining arrays using concatenate()


1.If axis is 0,then the shape of the arrays being joined must match on column dimension.
2.If axis is 1, then shape of the arrays being formed must match on rows dimension.

Here a is 2x2 and b is 2x3 array and hence concatenation based on rows is achieved
as rows are matching but concatenation of a and b could not be achieved on columns as columns are
unequal hence transpose of array is achieved then b is converted to array d with dimension 3x2, thus
now array a and d can be concatenated based on columns which are now same.

import numpy as np
import pandas as pd
a=np.array([[1,2],[3,4]])

Notes on Python by Alap Mukherjee Page 15


b=np.array([[3,4,5],[6,7,8]])
c=np.concatenate((a,b),axis=1)#arrays match row dimension as axis=1
print(c)
d=b.T # b was 2x3 now it has become 3x2 array
print(d)
e=np.concatenate((a,d),axis=0)#arrays match on column dimension as axis=0
print(e)

Output:-
[[1 2 3 4 5]
[3 4 6 7 8]]
[[3 6]
[4 7]
[5 8]]
[[1 2]
[3 4]
[3 6]
[4 7]
[5 8]]

#Obtaining Subsets of Arrays


#Splitting NumPy Arrays to get Contiguous Subsets
import numpy as np
import pandas as pd
arr1=np.arange(24)
print(arr1,"\n")
arr2=arr1.reshape(4,6)
print(arr2,"\n")
arr3=np.hsplit(arr2,2) #here 2 means in two equal parts
print(arr3,"\n")
arr4=np.hsplit(arr2,3) #here 3 means in three equal parts
print(arr4,"\n")
arr5=np.vsplit(arr2,2)
print(arr5)

Output:-

[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]

[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]]

Notes on Python by Alap Mukherjee Page 16


[array([[ 0, 1, 2],
[ 6, 7, 8],
[12, 13, 14],
[18, 19, 20]]), array([[ 3, 4, 5],
[ 9, 10, 11],
[15, 16, 17],
[21, 22, 23]])]

[array([[ 0, 1],
[ 6, 7],
[12, 13],
[18, 19]]), array([[ 2, 3],
[ 8, 9],
[14, 15],
[20, 21]]), array([[ 4, 5],
[10, 11],
[16, 17],
[22, 23]])]

[array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]]), array([[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]])]
# Using split() function
arr1=np.arange(10)
print(arr1)
arr2=np.split(arr1,[2,6])
print(arr2)
arr3=np.split(arr1,[1,4])
print(arr3)
[0 1 2 3 4 5 6 7 8 9]
[array([0, 1]), array([2, 3, 4, 5]), array([6, 7, 8, 9])]
[array([0]), array([1, 2, 3]), array([4, 5, 6, 7, 8, 9])]

#Extracting condition based Non-contiguous Subsets


import numpy as np
arr1=np.arange(10)
arr2=arr1.reshape(2,5)
print(arr2)
cond1=np.mod(arr2,5)==0
print(cond1)
np.extract(cond1,arr2)

Notes on Python by Alap Mukherjee Page 17


Output:-
[[0 1 2 3 4]
[5 6 7 8 9]]
[[ True False False False False]
[ True False False False False]]

Out[9]: array([0, 5])

#Arithmetic Operations on 2-D arrays


import numpy as np
arr1=np.arange(10)
arr2=arr1.reshape(2,5)
print(arr2)
arr3=arr2+0.3
print(arr3)
arr4=arr3+arr2
print(arr4)

Output:-
[[0 1 2 3 4]
[5 6 7 8 9]]
[[0.3 1.3 2.3 3.3 4.3]
[5.3 6.3 7.3 8.3 9.3]]
[[ 0.3 2.3 4.3 6.3 8.3]
[10.3 12.3 14.3 16.3 18.3]]

Notes on Python by Alap Mukherjee Page 18

You might also like