You are on page 1of 25

Practical File Don’t Copy this page start from Practical 1 on next page

Practical 1: Create a DataSeriese from List


Date:
Aim : To create DataSeriese in Python Pandas using a List and Using predefined Statistical functions
Source Code:

import pandas as p
lst = []
n = int(input("Enter number of elements : "))
for i in range(0, n):
t = int(input("Enter "+str(i+1)+" Number "))
lst.append(t) # adding the element
my_series = pd.Series(lst)
print("Sum of all the elements =",my_series.sum())
print("Largest Value =", my_series.max())
print("Smallest Value =",my_series.min())
print("Mean Value =",my_series.mean())
print("Median =",my_series.median())
print("Standard Deviation =",my_series.std())
print("Describe DataSeriese =",my_series.describe())
Output:
Enter number of elements : 5
Enter 1 Number 1
Enter 2 Number 2
Enter 3 Number 3
Enter 4 Number 4
Enter 5 Number 5
Sum of all the elements = 15
Largest Value = 5
Smallest Value = 1
Mean Value = 3.0
Median = 3.0
Standard Deviation = 1.5811388300841898
Describe DataSeriese = count 5.000000
mean 3.000000
std 1.581139
min 1.000000
25% 2.000000
50% 3.000000
75% 4.000000
max 5.000000
dtype: float64

Practical 2: Create a DataFrame from List


Date:

Aim : To create DataFrame in Python Pandas using a List

Source Code:
import pandas as pd
import numpy as np
lst = []
n = int(input("Enter number of elements : "))
for i in range(0, n):
t = int(input("Enter "+str(i+1)+" Number "))
lst.append(t) # adding the element

df = pd.DataFrame(lst,columns=['Values'])
display(df)

Output:
Enter number of elements : 5
Enter 1 Number 10
Enter 2 Number 11
Enter 3 Number 12
Enter 4 Number 13
Enter 5 Number 14

Values

0 10

1 11

2 12

3 13

4 14

Practical 3: Create a DataFrame and Extract Rows and Columns


Date:
Aim : To create DataFrame in Python Pandas and extract Rows and Columns

Source Code:
import pandas as pd
import numpy as np
data = [['tom', 10], ['nick', 15], ['juli', 14],['Suzan',28],['Sam',30],['tom',15]]
df = pd.DataFrame(data, columns = ['Name', 'Age'])
print("First three Elements :\n",df.head(3)) #default 5 items displayed head() and tail()
print("\nExtract Name Column\n",df.Name) #can also use df['Name'] or df.iloc[:,0]
print("\nExtract Age Column\n" ,df.Age) # df['Age'] or df.iloc[:,1] for both df.iloc[:,0:2]

#Extracting Rows using iloc()


print("\nExtracting Third Row\n",(df.iloc[3])) #[Rows,Cols]
print("\nExtracting Third to fifth Row and first and second column\n",df.iloc[3:6,0:2])
print("\nCount Occurrences of values in Name column\n",df.Name.value_counts())

Output:
First three Elements :
Name Age
0 tom 10
1 nick 15
2 juli 14

Extract Name Column


0 tom
1 nick
2 juli
3 Suzan
4 Sam
5 tom
Name: Name, dtype: object

Extract Age Column


0 10
1 15
2 14
3 28
4 30
5 15
Name: Age, dtype: int64

Extracting Third Row


Name Suzan
Age 28
Name: 3, dtype: object

Extracting Third to fifth Row and first and second column


Name Age
3 Suzan 28
4 Sam 30
5 tom 15

Practical 4: Create a Indexed DataFrame from Dictionary


Date:
Aim : To create Indexed DataFrame using dictonary in Python Pandas and Extract row based on user
input

Source Code:
import pandas as pd
import numpy as np
data = { 'Name':['Tom','Alex','Suzain','Rayan','Steve'],
'Age':[28,34,29,28,25],
'English':[87,67,54,89,73],
'Hindi':[54,65,34,65,76],
'Maths':[65,54,67,54,75],
'IP': [90,84,94,75,43]}
df = pd.DataFrame(data,index=[1,2,3,4,5])
display(df)
i=int(input("Enter rollno to see the marks : "))
print(df.iloc[i-1:i])

Output:

Nam Age English Hindi Maths IP


e

1 Tom 28 87 54 65 90

2 Alex 34 67 65 54 84

3 Suzain 29 54 34 67 94

4 Rayan 28 89 65 54 75

5 Steve 25 73 76 75 43

Enter rollno to see the marks: 3


Name Age English Hindi Maths IP
3 Suzain 29 54 34 67 94

Practical 5: Creating DataFrames from list of Dictionaries


Date:
Aim : To create two DataFrame using list of dictonary in Python Pandas based on subject taken

Source Code

#row indices, and column indices.


import pandas as pd
import numpy as np
data = [{'English':90, 'Hindi': 95,'IP':99},
{'English': 50, 'Hindi': 40, 'Maths': 92},
{'English': 55, 'Hindi': 70, 'PEd': 70}]

#With two common column same as dictionary keys


df1 = pd.DataFrame(data, index=['Suzan', 'Sam','Juli'], columns=['English',
'Hindi'])

#With different columns


df2 = pd.DataFrame(data, index=['Suzan',
'Sam','Juli'],columns=['IP','Maths','PEd'])
display('DataFrame of Common Subjects',df1)
display('DataFrame of Different Subjects',df2)

Output:

'DataFrame of Common Subjects'

English Hindi

Suzan 90 95

Sam 50 40

Juli 55 70

'DataFrame of Different Subjects'

IP Maths PEd

Suzan 99.0 NaN NaN

Na
Sam 92.0 NaN
N

Na
Juli NaN 70.0
N

Practical 6: Adding and Removing columns from DataFrame


Date:
Aim : To adding and removing columns from DataFrame

Source Code
import pandas as pd
d = {'one' : pd.Series([1, 2, 3], index=['a', 'b', 'c']),
'two' : pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])}
df = pd.DataFrame(d) # df DataFrame object
print(df)
print ("Adding a new column by passing as Series:")
df['three']=pd.Series([10,20,30],index=['a','b','c'])
print( df )

# using del function


print ("Deleting the first column using DEL function:")
del df['one']
print (df)

# using pop function


print ("Deleting another column using POP function:")
df.pop('two')
print (df)

Output:

one two
a 1.0 1
b 2.0 2
c 3.0 3
d NaN 4
Adding a new column by passing as Series:
one two three
a 1.0 1 10.0
b 2.0 2 20.0
c 3.0 3 30.0
d NaN 4 NaN
Deleting the first column using DEL function:
two three
a 1 10.0
b 2 20.0
c 3 30.0
d 4 NaN
Deleting another column using POP function:
three
a 10.0
b 20.0
c 30.0
d NaN

Practical 7: Merging two DataFrames


Date:
Aim : To merge two DataFrames together as a single Dataframe

Source Code
import pandas as pd
import numpy as np
data1 = [[1,'tom', 10],[2,'nick', 15], [3,'juli', 14]]
df1 = pd.DataFrame(data1, columns = ['RollNo','Name', 'Age'])
data2 = [[1,98, 100],[2,98, 15], [3,75, 50]]
df2 = pd.DataFrame(data2, columns = ['RollNo','Eng', 'Hin'])

#merging data into merged dataframe


merged = pd.merge(df1,df2, on='RollNo')
display(merged)
t=input("\nEnter the Subject Code For Example 'Eng' for English \t:")
print(t," column data \n",merged[t])

print("\nSum of ",t," column \n",merged[t].sum())

avg=merged[t].sum()/merged[t].count()
print("\nPercentage of ",t," column \n",avg)

Output:

RollN Name Age Eng Hin


o

0 1 tom 10 98 100

nic
1 2 15 98 15
k

2 3 juli 14 75 50

Enter the Subject Code For Example 'Eng' for English :Eng
Eng column data
0 98
1 98
2 75
Name: Eng, dtype: int64

Sum of Eng column


271

Percentage of Eng column


90.33333333333333

Practical 8: Importing CSV file data and some important functions


Date:
Aim : To import CSV file data and working on it
Source Code:
import pandas as pd
import numpy as np
data=pd.read_csv("c:\emp.csv")
print("Columns in DataFrame :\n",data.columns)
print("\nRange Index :\n",data.index)
print("\nBoth Data and Indexes :\n", data.axes)
print("\nDataType of columns :\n",data.dtypes)
print("\nTotal no of elements:" ,data.size)
print("\nCount of Rows and Columns:",data.shape)
print("\nPrinting data values :\n",data.values)
print("\nCheching if DataFrame is empty :",data.empty)
print("\nCheching diamentions of DataFrame :",data.ndim,"D")
rows=data.sample(n=10) #sample data random rows
display("Random sample of data:",rows)
rows=data.sample(frac=.25) # 25 % of data .25
display("Random sample of 25% data:",rows)

output:
Columns in DataFrame :
Index(['EMPLOYEE_ID', 'FIRST_NAME', 'LAST_NAME', 'EMAIL', 'PHONE_NUMBER',
'HIRE_DATE', 'JOB_ID', 'SALARY', 'COMMISSION_PCT', 'MANAGER_ID',
'DEPARTMENT_ID'],
dtype='object')
Range Index :
RangeIndex(start=0, stop=107, step=1)

Both Data and Indexes :


[RangeIndex(start=0, stop=107, step=1), Index(['EMPLOYEE_ID', 'FIRST_NAME',
'LAST_NAME', 'EMAIL', 'PHONE_NUMBER',
'HIRE_DATE', 'JOB_ID', 'SALARY', 'COMMISSION_PCT', 'MANAGER_ID',
'DEPARTMENT_ID'],
dtype='object')]

DataType of columns :
EMPLOYEE_ID int64
FIRST_NAME object
LAST_NAME object
...
MANAGER_ID float64
DEPARTMENT_ID float64
dtype: object

Total no of elements: 1177


Count of Rows and Columns: (107, 11)

Printing data values :


[[100 'Steven' 'King' ... nan nan 90.0]
[101 'Neena' 'Kochhar' ... nan 100.0 90.0]
...
[205 'Shelley' 'Higgins' ... nan 101.0 110.0]
[206 'William' 'Gietz' ... nan 205.0 110.0]]

Checking if DataFrame is empty : False

Checking dimensions of DataFrame : 2 D


Practical 9: Importing CSV file Extracting Data
Date:
Aim : To import CSV file data extracting data from it loc() and iloc().

Source code:
import pandas as pd
import numpy as np
data=pd.read_csv("c:\emp.csv")
print(data.axes)
print ("Extracting Columns by Column Names :\n",data[['EMPLOYEE_ID','FIRST_NAME','SALARY']])
print ("\nExtracting Columns by Column Numbers :\n",data[data.columns[1:6]])
print ("\nExtracting Rows (1-3) :\n",data.loc[1:3])
print ("\nExtracting 3 Rows and Columns by Column Names Using loc() :\n",
data.loc[1:3,['FIRST_NAME','SALARY','DEPARTMENT_ID']])
print ("\nExtracting 3 Rows and Columns numbers Using loc() :\n",data.loc[1:3,data.columns[1:4]])
print ("\nExtracting 3 Rows and Columns Range Using loc() :\n",data.loc[1:3, 'FIRST_NAME':'SALARY'])
print ("\nExtracting 3 Rows and Columns Range Using loc() :\n",data.loc[1:3,'JOB_ID':])
print ("\nExtracting 3 Rows and Columns Range Using iloc() :\n",data.iloc[1:3,0:2])
print ("\nExtracting 3 Rows and Columns Range Using iloc() :\n",data.iloc[1:3,1:5])

output:

Extracting Columns by Column Names :


EMPLOYEE_ID FIRST_NAME SALARY
0 100 Steven 30000
1 101 Neena 17000
.. ... ... ...
105 205 Shelley 12000
106 206 William 8300

[107 rows x 3 columns]

Extracting Columns by Column Numbers :


FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIRE_DATE
0 Steven King SKING 515.123.4567 17-JUN-87
1 Neena Kochhar NKOCHHAR 515.123.4568 21-SEP-89
.. ... ... ... ... ...
105 Shelley Higgins SHIGGINS 515.123.8080 07-JUN-94
106 William Gietz WGIETZ 515.123.8181 07-JUN-94

[107 rows x 5 columns]


Extracting Rows (1-3) :
EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIRE_DATE \
1 101 Neena Kochhar NKOCHHAR 515.123.4568 21-SEP-89
2 102 Lex De Haan LDEHAAN 515.123.4569 13-JAN-93
3 103 Alexander Hunold AHUNOLD 590.423.4567 03-JAN-90

JOB_ID SALARY COMMISSION_PCT MANAGER_ID DEPARTMENT_ID


1 AD_VP 17000 NaN 100.0 90.0
2 AD_VP 17000 NaN 100.0 90.0
3 IT_PROG 9000 NaN 102.0 60.0

Extracting 3 Rows and Columns by Column Names Using loc() :


FIRST_NAME SALARY DEPARTMENT_ID
1 Neena 17000 90.0
2 Lex 17000 90.0
3 Alexander 9000 60.0

Extracting 3 Rows and Columns numbers Using loc() :


FIRST_NAME LAST_NAME EMAIL
1 Neena Kochhar NKOCHHAR
2 Lex De Haan LDEHAAN
3 Alexander Hunold AHUNOLD

Extracting 3 Rows and Columns Range Using loc() :


FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIRE_DATE JOB_ID
SALARY
1 Neena Kochhar NKOCHHAR 515.123.4568 21-SEP-89 AD_VP 17000
2 Lex De Haan LDEHAAN 515.123.4569 13-JAN-93 AD_VP 17000
3 Alexander Hunold AHUNOLD 590.423.4567 03-JAN-90 IT_PROG 9000

Extracting 3 Rows and Columns Range Using loc() :


JOB_ID SALARY COMMISSION_PCT MANAGER_ID DEPARTMENT_ID
1 AD_VP 17000 NaN 100.0 90.0
2 AD_VP 17000 NaN 100.0 90.0
3 IT_PROG 9000 NaN 102.0 60.0

Extracting 3 Rows and Columns Range Using iloc() :


EMPLOYEE_ID FIRST_NAME
1 101 Neena
2 102 Lex

Extracting 3 Rows and Columns Range Using iloc() :


FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER
1 Neena Kochhar NKOCHHAR 515.123.4568
2 Lex De Haan LDEHAAN 515.123.4569

Practical 10: Importing CSV file Modifying data and Saving to CSV file
Date:
Aim : To modifying data in CSV file and writing it back to disk
Source code:
import pandas as pd
import numpy as np
data=pd.read_csv("e:\emp.csv")
print ("\nExtracting 3 Rows and Columns Range Using loc() :\n",data.loc[1:3,
'FIRST_NAME':'SALARY'])
#modifying dataframe value
data.FIRST_NAME[1]='Amit' # gives a warning
data.LAST_NAME[1]='Singh'
data.EMAIL[1]="s.amit18"
data.SALARY[1]=20000
data.HIRE_DATE='27-12-1975' #updates all the columns
data.PHONE_NUMBER[1]='955.95.83030'
print ("\nExtracting 3 Rows and Columns Range Using loc() :\
n",data.loc[1:3,'FIRST_NAME':'SALARY'])

#adding row
data.at[2,:]=102,'Punita','Singh','P.amit18','201.92.0102','21-10-
89','AD_VP',30000,.5,100,20
print ("\nExtracting 3 Rows and Columns Range Using loc() :\
n",data.loc[1:3,'EMPLOYEE_ID':])

# saving changes to csv file


data.to_csv("e:\emp.csv")

output:
Extracting 3 Rows and Columns Range Using loc() :
FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIRE_DATE JOB_ID
SALARY
1 Neena Kochhar NKOCHHAR 515.123.4568 21-SEP-89 AD_VP 17000
2 Lex De Haan LDEHAAN 515.123.4569 13-JAN-93 AD_VP 17000
3 Alexander Hunold AHUNOLD 590.423.4567 03-JAN-90 IT_PROG 9000

Extracting 3 Rows and Columns Range Using loc() :


FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER HIRE_DATE JOB_ID
SALARY
1 Amit Singh s.amit18 955.95.83030 27-12-1975 AD_VP
20000
2 Lex De Haan LDEHAAN 515.123.4569 27-12-1975 AD_VP
17000
3 Alexander Hunold AHUNOLD 590.423.4567 27-12-1975 IT_PROG
9000

Extracting 3 Rows and Columns Range Using loc() :


EMPLOYEE_ID FIRST_NAME LAST_NAME EMAIL PHONE_NUMBER
HIRE_DATE \
1 101 Amit Singh s.amit18 955.95.83030 27-12-1975
2 102 Punita Singh P.amit18 201.92.0102 21-10-89
3 103 Alexander Hunold AHUNOLD 590.423.4567 27-12-1975
JOB_ID SALARY COMMISSION_PCT MANAGER_ID DEPARTMENT_ID
1 AD_VP 20000 NaN 100.0 90.0
2 AD_VP 30000 0.5 100.0 20.0
3 IT_PROG 9000 NaN 102.0 60.0

Practical 11: Iteration rows and columns using iterrows() and iteritems()

Date:
Aim : To do iteration on rows and columns using iterrows() and iteritems()

Source code:

import pandas as pd
import numpy as np
data1 = [[1,'tom', 10],[2,'nick', 15], [3,'juli', 14]]
df1 = pd.DataFrame(data1, columns = ['RollNo','Name', 'Age'])
sum=0
for label, contents in df1.iterrows():
print ("\nLabel ", label)
print ("contents:", contents, sep='\n')

for label, contents in df1.iteritems():


print ("\nLabel ", label)
print ("contents:", contents, sep='\n')

output:
Label 0
contents:
RollNo 1
Name tom
Age 10
Name: 0, dtype: object

Label 1
contents:
RollNo 2
Name nick
Age 15
Name: 1, dtype: object

Label 2
contents:
RollNo 3
Name juli
Age 14
Name: 2, dtype: object
Label RollNo
contents:
0 1
1 2
2 3
Name: RollNo, dtype: int64

Label Name
contents:
0 tom
1 nick
2 juli
Name: Name, dtype: object

Label Age
contents:
0 10
1 15
2 14
Name: Age, dtype: int64

Practical 12: Descriptive statistics with pandas


Date:
Aim : To work with Statistics on pandas DataFrame

Source code:
import pandas as pd
import numpy as np
diSales={2016:{'Qtr1':34500,'Qtr2':56000,'Qtr3':47000,'Qtr4':49000},
2017:{'Qtr1':44900,'Qtr2':46100,'Qtr3':57000,'Qtr4':59000},
2018:{'Qtr1':54500,'Qtr2':51000,'Qtr3':57000,'Qtr4':58000},
2019:{'Qtr1':61000}}
sal_df=pd.DataFrame(diSales)
print ( "Data Frame :\n",sal_df )
print (" min() : \n",sal_df.min())
print (" max() : \n",sal_df.max())

##### default axis is axis 0 and for axis 1 :-


print (" min() axis 1 : \n",sal_df.min(axis=1))
print (" max() axis 1 : \n",sal_df.max(axis=1))

print (" mode() : \n",sal_df.mode(axis=0)) #try all for axis=1


print (" mean() : \n",sal_df.mean(axis=0))
print (" median() : \n",sal_df.median(axis=0))
print (" Count() : \n",sal_df.count(axis=0))
print ("Sum() axis=0: \n",sal_df.sum(axis=0))
print ("Quantile() axis=0: \n",sal_df.quantile(q=[.25,.5,.75,1]))
print ("Var() axis=0: \n",sal_df.var(axis=0))
# applying group functions on single column
print(" Min of 2016 : ",sal_df[2016].min())
print(" Max of 2016 : ",sal_df[2016].max())
print(" Sum of 2016 : ",sal_df[2016].sum())

# applying group functions on multiple columns


print(" Sum of 2016 and 2019 : \n",sal_df[[2016,2019]].sum())
print(" Min of 2016 and 2019: \n",sal_df[[2016,2019]].min())
print(" Max of 2016 and 2019: \n",sal_df[[2016,2019]].max())

# applying functions on Rows


print(" Sum of Qtr1 : \n",sal_df.loc['Qtr1'].sum())
print(" Min of Qtr1: \n",sal_df.loc['Qtr1'].min())
print(" Max Qtr1: \n",sal_df.loc['Qtr1'].max())

# multiple rows
print(" Sum of Qtr1 to Qtr3 : \n",sal_df.loc['Qtr1':'Qtr3'].sum())
print(" Min of Qtr1 to Qtr3: \n",sal_df.loc['Qtr1':'Qtr3'].min())
print(" Max Qtr1 to Qtr3: \n",sal_df.loc['Qtr1':'Qtr3'].max())

#applying functions to subset ( few rows and columns )


print(" Sum of Qtr1to3 2018-19 : \
n",sal_df.loc['Qtr3':'Qtr4',2018:2019].sum())
print(" Min of Qtr1to3 2018-19: \n",sal_df.loc['Qtr3':'Qtr4',2018:2019].min())
print(" Max Qtr1to3 2018-19: \n",sal_df.loc['Qtr3':'Qtr4',2018:2019].max())

output:
Data Frame :
2016 2017 2018 2019
Qtr1 34500 44900 54500 61000.0
Qtr2 56000 46100 51000 NaN
Qtr3 47000 57000 57000 NaN
Qtr4 49000 59000 58000 NaN

min() :
2016 34500.0
. . .
2019 61000.0
dtype: float64

max() :
2016 56000.0
. . .
2019 61000.0
dtype: float64

min() axis 1 :
Qtr1 34500.0
. . .
Qtr4 49000.0
dtype: float64

max() axis 1 :
Qtr1 61000.0
. . .
Qtr4 59000.0
dtype: float64

mode() :
2016 2017 2018 2019
0 34500 44900 51000 61000.0
. . .
3 56000 59000 58000 NaN

mean() :
2016 46625.0
. . .
2019 61000.0
dtype: float64

median() :
2016 48000.0
. . .
2019 61000.0
dtype: float64

Count() :
2016 4
. . .
2019 1
dtype: int64

Sum() axis=0:
2016 186500.0
. . .
2019 61000.0
dtype: float64

Quantile() axis=0:
2016 2017 2018 2019
0.25 43875.0 45800.0 53625.0 61000.0
. . .
1.00 56000.0 59000.0 58000.0 61000.0

Var() axis=0:
2016 8.022917e+07
. . .
2019 NaN
dtype: float64

Min of 2016 : 34500


Max of 2016 : 56000
Sum of 2016 : 186500
Sum of 2016 and 2019 :
2016 186500.0
2019 61000.0
dtype: float64

Min of 2016 and 2019:


2016 34500.0
2019 61000.0
dtype: float64

Max of 2016 and 2019:


2016 56000.0
2019 61000.0
dtype: float64

Sum of Qtr1 :
194900.0
Min of Qtr1:
34500.0

Max Qtr1:
61000.0

Sum of Qtr1 to Qtr3 :


2016 137500.0
2017 148000.0
2018 162500.0
2019 61000.0
dtype: float64

Min of Qtr1 to Qtr3:


2016 34500.0
2017 44900.0
2018 51000.0
2019 61000.0
dtype: float64

Max Qtr1 to Qtr3:


2016 56000.0
2017 57000.0
2018 57000.0
2019 61000.0
dtype: float64

Sum of Qtr1to3 2018-19 :


2018 115000.0
2019 0.0
dtype: float64

Min of Qtr1to3 2018-19:


2018 57000.0
2019 NaN
dtype: float64
Max Qtr1to3 2018-19:
2018 58000.0
2019 NaN
dtype: float64

Practical 13: PIVOTING


Date:
Aim : To do Pivoting on pandas DataFrame(pivot() pivottable() )

import pandas as pd
import numpy as np
d={ 'Tutor':['Tahira','Gurjyot','Anusha','Jacob','Vankat'],
'Classes':[28,36,41,32,48],
'Country':['USA','UK','Japan','USA','Brazil']}
df=pd.DataFrame(d)
print(df)
df.pivot(index='Country', columns='Tutor',values='Classes')

test=df.pivot(index='Country', columns='Tutor',values='Classes')
print(test)

#pivot_table

import pandas as pd
import numpy as np
d={ 'Tutor':['Tahira','Gurjyot','Anusha','Jacob','Vankat',
'Tahira','Gurjyot','Anusha','Jacob','Vankat',
'Tahira','Gurjyot','Anusha','Jacob','Vankat',
'Tahira','Gurjyot','Anusha','Jacob','Vankat'],
'Classes':[28,36,41,32,40,36,40,36,40,46,24,30,44,40,32,36,32,36,24,38],
'Quarter':[1,1,1,1,1,2,2,2,2,2,3,3,3,3,3,4,4,4,4,4],
'Country':['USA','UK','Japan','USA','Brazil','USA','USA','Japan',
'Brazil','USA','Brazil','USA','UK','Brazil','USA','Japan',
'Japan','Brazil','UK','USA']}
df=pd.DataFrame(d)
print(df)
test=df.pivot_table(index='Tutor', columns='Country',values='Classes')
print(test)

#sorting
df.sort_values('Country')
df.sort_values('Tutor')
df.sort_values(['Country','Tutor'])
df.sort_values(['Tutor','Country'])
df.sort_values(['Tutor','Country'], ascending=False)

output:
Tutor Classes Country
0 Tahira 28 USA
1 Gurjyot 36 UK
2 Anusha 41 Japan
3 Jacob 32 USA
4 Vankat 48 Brazil

Tutor Anusha Gurjyot Jacob Tahira Vankat


Country
Brazil NaN NaN NaN NaN 48.0
Japan 41.0 NaN NaN NaN NaN
UK NaN 36.0 NaN NaN NaN
USA NaN NaN 32.0 28.0 NaN

Tutor Classes Quarter Country


0 Tahira 28 1 USA
1 Gurjyot 36 1 UK
2 Anusha 41 1 Japan
3 Jacob 32 1 USA
4 Vankat 40 1 Brazil
5 Tahira 36 2 USA
6 Gurjyot 40 2 USA
7 Anusha 36 2 Japan
8 Jacob 40 2 Brazil
9 Vankat 46 2 USA
10 Tahira 24 3 Brazil
11 Gurjyot 30 3 USA
12 Anusha 44 3 UK
13 Jacob 40 3 Brazil
14 Vankat 32 3 USA
15 Tahira 36 4 Japan
16 Gurjyot 32 4 Japan
17 Anusha 36 4 Brazil
18 Jacob 24 4 UK
19 Vankat 38 4 USA
Country Brazil Japan UK USA
Tutor
Anusha 36.0 38.5 44.0 NaN
Gurjyot NaN 32.0 36.0 35.000000
Jacob 40.0 NaN 24.0 32.000000
Tahira 24.0 36.0 NaN 32.000000
Vankat 40.0 NaN NaN 38.666667

Practical 14: Histogram


Date:
Aim : To create Histograms on pandas DataFrame

import pandas as pd
import numpy as np
d={ 'Age':[37,28,38,44,53,69,74,53,35,38,66,46,24,45,92,48,51,62,57]}
hage=pd.DataFrame(d)
hage.hist()
hage.hist(column='Age',grid=True,bins=20 )
Output:
array([[<matplotlib.axes._subplots.AxesSubplot object at
0x0000019A7508D888>]],
dtype=object)

Practical 15: User defined Functions


Date:
Aim : To create user defined functions and calling them

Source code:
#userdefined Function
def addnum (): #function defination
a=int(input("Please enter a number"))
b=int(input("Please enter a number"))
return(a+b)

c=addnum() #function calling


print("sum = ",c)
print("Sum= ",addnum() )

Output:

Please enter a number5


Please enter a number20
sum = 25
Please enter a number25
Please enter a number22
Sum= 47

Practical 16: Table wise Function Application and lambda function


Date:

Aim : To use pipe() apply() and appilymap()and lambda function

Source Code:

import pandas as pd
import numpy as np
import math
# User defined function
def adder(adder1,adder2):
return adder1+adder2
#Create a Dictionary of series
d = {'Score_Math':pd.Series([66,57,75,44,31,67]),
'Score_Science':pd.Series([89,87,67,55,47,72])}
df = pd.DataFrame(d)
print ("DataFrame\n",df)
print ("PIPE() \n",df.pipe(adder,2))

print ("On Rows apply(np.mean,axis=1)\n",df.apply(np.mean,axis=1)) # row wise

print ("On Columns apply(np.mean,axis=0) \n",df.apply(np.mean,axis=0)) #column


wise
print( " LAMBDA \n", df.applymap(lambda x:math.sqrt(x)))

Output:
DataFrame
Score_Math Score_Science
0 66 89
1 57 87
. . . .
5 67 72
PIPE()
Score_Math Score_Science
0 68 91
1 59 89
2 77 69
3 46 57
4 33 49
5 69 74
On Rows apply(np.mean,axis=1)
0 77.5
1 72.0
2 71.0
3 49.5
4 39.0
5 69.5
dtype: float64
On Columns apply(np.mean,axis=0)
Score_Math 56.666667
Score_Science 69.500000
dtype: float64
LAMBDA
Score_Math Score_Science
0 8.124038 9.433981
1 7.549834 9.327379
2 8.660254 8.185353
3 6.633250 7.416198
4 5.567764 6.855655
5 8.185353 8.485281

Practical 17: Data Visualization (Line Chart)


Date:

Aim : : To use line chart plot() for data Visualisation

Source Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as pl
RollNo=[1,2,3,4,5]
Maths=[20,22,26,28,30]
IP=[21,24,29,16,25]
Science=[26,23,20,26,22]
pl.title("Grade 12 Preboard Exams")
names={'Rayan','Unnati','Khushi','Aryan','Yakshesh'}
pl.xlabel('Names')
pl.ylabel('Marks')
pl.xticks(RollNo,names)
pl.plot(RollNo,Maths,'r',marker='o',label='Maths')
pl.plot(RollNo,IP,'k',marker='s',label='IP')
pl.plot(RollNo,Science,'b',marker='*',label='Science')
pl.legend()
pl.grid(color='y')

Output:

Practical 18: Data Visualization Bar chart


Date:

Aim : : To use Barchart for data Visualisation

Source Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
ItemCode=np.arange(1,6)
SalesJan=[50,60,25,80,60]
SalesFeb=[30,40,35,70,80]
SalesMar=[40,50,45,40,92]
plt.bar(ItemCode-0.2,SalesJan,width=0.2 ,color='red',label="Jan")
plt.bar(ItemCode,SalesFeb,width=0.2, color='blue',label="Feb")
plt.bar(ItemCode+0.2,SalesMar,width=0.2, color='green',label="Feb")
plt.title('Total Sales March')
plt.xlabel('Items')
plt.ylabel('Quantity')
plt.xticks(RollNo,["Mouse","Printer","Scanner","WebCam","PenTab"],rotation=90)
pl.legend()
pl.grid(True)

output:

Practical 19: Data Visualization Histogram


Date:

Aim : : To use histogram for data Visualisation

Source Code:
import numpy as np
import matplotlib.pyplot as pl
marks=[22,25,18,19,11,21,28,30,24,24,23,15,20,27,21,21,13,30,18,25]
pl.hist(marks,edgecolor='r',bins=5,color='blue')
pl.ylabel ('Frequency' )
pl.xlabel ('Bins/Ranges')
pl.title('My Chart')

#specifying our own bins 5 (20/5=4) so group will be of 4 eg (11-14)


#min value is 11 and max is 30
#range will be 11-14, 15-18, 19-22, 23-26, 27-30
#freq will be 2 3 6 5 4 = 20

x=np.random.randn(1000)
y=np.random.randn(1000)
pl.hist([x,y], bins=10,edgecolor='k',histtype='barstacked')

x=np.random.randn(1000)
y=pl.hist(x,bins=10,edgecolor='b',color='yellow')
a=pd.Series(y[1])
b=pd.Series(y[0])
a.pop(10)
a=a+.25
pl.plot(a,b,'k')

output:

Practical 20: Data Visualization BOXPLOT


Date:

Aim : To use boxplot for data Visualisation

Source Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as pl
x=[2,3,1,4,4,6,8,10,10,3,4]
y=[5,3,1,7,6,8,9,10,8,3,4]
z=pl.boxplot([x,y],patch_artist=True,labels=['LG','OGeneral'])
pl.title('Air-conditioner' )

colors = ['pink', 'red']


for patch, color in zip(z['boxes'], colors):
patch.set_facecolor(color)

output:
Practical 21: Data Visualization Piechart
Date:

Aim : To use piechart for data Visualisation

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

x=[10,30,27,13,8,12]

fr=['Peach','Banana','Grapes','Oranges','Pineapple','Apple']

co=['pink','yellow','lightgreen','orange','brown','red']

plt.pie(x,labels=fr,colors=co,autopct='%1.0f%
%',shadow=True,explode=(0,.1,.1,0,0,0))

plt.show()

output:

Practical 22: Structure Query Language


Date:

You might also like