Professional Documents
Culture Documents
SAMPLE CODES
Pandas or Python Pandas is Python’s library for data analysis. Pandas have derived its name from “panel data
system” which refers to multidimensional, structured data sets. The main author of Pandas is Wes McKinney.
Data Analysis: It refers to process of evaluating big data sets using analytical and statistical tools so as to
discover useful information and conclusions to support business decision-making.
NumPy Arrays: Numerical Python or Numeric Python is an open source module of Python that offers
functions and routines for fast mathematical computation on arrays and matrices. An array refers to a named
group of homogeneous (of same type) elements.
NumPy Arrays come in two forms:-
1-D arrays known as Vectors having single row/column only.
Multidimensional arrays known as Matrices having multiple rows and columns.
Output:-
A 1-D array is
[1 2 3]
A 2-D array
[[1 2 3]
[4 5 6]
[7 8 9]]
An array= [1 3 5 7 9]
import numpy as np 0 1
import pandas as pd 1 3
a=np.arange(1,11,2) 2 5
8.
print("An array=",a) 3 7
s=pd.Series(a) 4 9
print(s) dtype: int32
Jan 31
Feb 28
Mar 31
b=pd.Series(data=[31,28,31,30],index=['Jan','Feb','
Apr 30
Mar','Apr'])
dtype: int64
print(b)
13. Jan 31.0
c=pd.Series(data=[31,28,31,30],index=['Jan','Feb','
Feb 28.0
Mar','Apr'],dtype=np.float64)
Mar 31.0
print(c)
Apr 30.0
dtype: float64
[1 2 3 4]
1 2
2 4
3 6
a=np.arange(1,5) 4 8
print(a) dtype: int32
b=pd.Series(index=a, data=a*2) 1 1
14.
print(b) 2 4
c=pd.Series(index=a, data=a**2) 3 9
print(c) 4 16
dtype: int32
15. a=[1,2,3,4] 0 1
b=pd.Series(data=a*2) 1 2
print(b) 2 3
3 4
4 1
5 2
6 3
17. Output:-
True
Output:-
Jan 31
Feb 28
Mar 31
Apr 30
dtype: int64
31
0 1
1 2
2 3
3 4
4 5
5 6
dtype: int64
4
1 a
2 e
3 i
2 o
4 u
dtype: object
2 e
Output:-
1 10
2 11
3 12
4 13
5 14
19
dtype: int64
2 11
3 12
4 13
5 14
dtype: int64
2 11
3 12
dtype: int64
1 10
3 12
5 14
dtype: int64
obj1=pd.Series(index=[1,2,3,4,5], data=[10,11,12,13,14])
print(obj1)
obj1[2]=5 #using index
print(obj1)
obj1[1:3]=99 #using slicing
print(obj1)
Output:
1 10
2 11
obj1=pd.Series(index=[1,2,3,4,5,6,7,8,9,10], data=[10,11,12,13,14,15,16,17,18,19])
print(obj1)
print(obj1.head()) #returns first five results by default
print(obj1.tail()) #returns last five results by default
print(obj1.head(4))
print(obj1.tail(3))
Output:
1 10
2 11
3 12
4 13
5 14
6 15
7 16
8 17
9 18
10 19
dtype: int64
1 10
2 11
3 12
4 13
5 14
dtype: int64
6 15
7 16
Output:
a 1
b 2
c 3
d 4
dtype: int64
a 3
b 4
c 5
d 6
dtype: int64
a 2
b 4
c 6
d 8
dtype: int64
a 1
b 4
c 9
d 16
dtype: int64
a False
b False
c False
Output:
a 6
b 8
c 10
d 12
dtype: int64
1 11
2 14
3 17
23 4 20
dtype: int64
a NaN
b NaN
c NaN
d NaN
1 NaN
2 NaN
3 NaN
4 NaN
dtype: float64
1 3.0
2 7.0
3 11.0
4 15.0
5 NaN
6 NaN
7 NaN
8 NaN
dtype: float64
24 #Filtering entries
obj1=pd.Series(index=[1,2,3,4,5], data=[6,7,8,9,10])
Output:
1 6
2 7
3 8
4 9
5 10
dtype: int64
1 False
2 False
3 False
4 True
5 True
dtype: bool
4 9
5 10
dtype: int64
#Reindexing Series object
obj1=pd.Series(index=[1,2,3,4],data=['a','b','c','d'])
print(obj1)
obj2=obj1.reindex([1,3,2,4])
print(obj2)
Output:
1 a
25 2 b
3 c
4 d
dtype: object
1 a
3 c
2 b
4 d
dtype: object
#Dropping entries from an Axis
obj1=pd.Series(index=[1,2,3,4],data=['a','b','c','d'])
print(obj1)
obj1=obj1.drop(2) #data value corresponding to index 2 is removed
26 print(obj1)
Output:
1 a
2 b
Output:
{1: 'a', 2: 'b', 3: 'c', 4: 'd'}
[1 2 3 4]
[1. 2. 3. 4.]
a="JaipurIsCapitalOfRajasthan"
b=np.fromiter(a,dtype="U2")
28 print(b)
Output:
['J' 'a' 'i' 'p' 'u' 'r' 'I' 's' 'C' 'a' 'p' 'i' 't' 'a' 'l' 'O' 'f' 'R' 'a'
'j' 'a' 's' 't' 'h' 'a' 'n']
Output:
[1 2 3 4]
[1.2 2.5 3.1]
[1 3 5]
Output:
['J' 'a' 'i' 'p' 'u']
31
Output:
[0 1 2 3 4 5 6 7 8 9]
[[0 1 2 3 4]
[5 6 7 8 9]]
[[0 1]
[2 3]
[4 5]
[6 7]
[8 9]]
32 #slices in 2d array
import numpy as np
import pandas as pd
a=np.array([[1,2,3,4,5],[2,5,6,1,3],[6,7,8,9,1],[9,7,5,2,4]])
print(a)
slc1=a[0:3,0:4]
print(slc1)
slc2=a[:3,3:]
print(slc2)
Output:-
[[1 2 3 4 5]
[2 5 6 1 3]
[6 7 8 9 1]
[9 7 5 2 4]]
[[1 2 3 4]
[2 5 6 1]
[6 7 8 9]]
[[4 5]
[1 3]
[9 1]]
[[2 5 6]
[9 7 5]]
Output:-
[1 2 3 4 5 6 7 8]
[[1 2 3 4]
[5 6 7 8]]
Output:-
[[ 1 2 3 7 8 9]
[ 4 5 6 10 11 12]]
arr1=np.array([[1,2,3],[4,5,6],[7,8,9]])
print(arr1)
arr2=np.array([[10,11,12],[13,14,15]])
print(arr2)
arr3=np.concatenate((arr1,arr2))
print("Array after concatenation")
print(arr3)
arr4=np.array([[1,2],[3,4]])
arr5=np.array([[1,2,3],[4,5,6]])
print("Array after concatenation")
arr6=np.concatenate((arr4,arr5),axis=1) #shape of both arrays match on rows dimension
print(arr6)
35
Output:-
[[1 2 3]
[4 5 6]
[7 8 9]]
[[10 11 12]
[13 14 15]]
Array after concatenation
[[ 1 2 3]
[ 4 5 6]
[ 7 8 9]
[10 11 12]
[13 14 15]]
Array after concatenation
[[1 2 1 2 3]
[3 4 4 5 6]]
Here a is 2x2 and b is 2x3 array and hence concatenation based on rows is achieved
as rows are matching but concatenation of a and b could not be achieved on columns as columns are
unequal hence transpose of array is achieved then b is converted to array d with dimension 3x2, thus
now array a and d can be concatenated based on columns which are now same.
import numpy as np
import pandas as pd
a=np.array([[1,2],[3,4]])
Output:-
[[1 2 3 4 5]
[3 4 6 7 8]]
[[3 6]
[4 7]
[5 8]]
[[1 2]
[3 4]
[3 6]
[4 7]
[5 8]]
Output:-
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]
[[ 0 1 2 3 4 5]
[ 6 7 8 9 10 11]
[12 13 14 15 16 17]
[18 19 20 21 22 23]]
[array([[ 0, 1],
[ 6, 7],
[12, 13],
[18, 19]]), array([[ 2, 3],
[ 8, 9],
[14, 15],
[20, 21]]), array([[ 4, 5],
[10, 11],
[16, 17],
[22, 23]])]
[array([[ 0, 1, 2, 3, 4, 5],
[ 6, 7, 8, 9, 10, 11]]), array([[12, 13, 14, 15, 16, 17],
[18, 19, 20, 21, 22, 23]])]
# Using split() function
arr1=np.arange(10)
print(arr1)
arr2=np.split(arr1,[2,6])
print(arr2)
arr3=np.split(arr1,[1,4])
print(arr3)
[0 1 2 3 4 5 6 7 8 9]
[array([0, 1]), array([2, 3, 4, 5]), array([6, 7, 8, 9])]
[array([0]), array([1, 2, 3]), array([4, 5, 6, 7, 8, 9])]
Output:-
[[0 1 2 3 4]
[5 6 7 8 9]]
[[0.3 1.3 2.3 3.3 4.3]
[5.3 6.3 7.3 8.3 9.3]]
[[ 0.3 2.3 4.3 6.3 8.3]
[10.3 12.3 14.3 16.3 18.3]]