You are on page 1of 102

DATA HANDLING (PANDAS) – 25 MARKS

SERIES , DATAFARAME , PYPLOT


MYSQL XI AND XII – 25 MARKS
(FUNCTIONS , GROUP BY CLAUSE , JOINING)
NETWORKING – 10 MARKS
FULL FORMS
SHORT CASE STUDY (5 MARKS)
SOCIAL IMPACT-10 MARKS
(SHORT QUESTIONS)
PRACTICAL'S
EXAM – PANDAS + MYSQL = 15 MARKS
PRACTICAL FILE – 5MARKS
PROJECT – 5 MARKS
VIVA – 5 MARKS
DATA
HANDLING
IN PANDAS
(25 marks)
INTRODUCTION TO PANDAS
• Python is an open source Python library providing high
performance data manipulation and analysis tool using
its powerful data structures like series and data frames.
• Pandas stand for PANEL DATA.
• it provides fast data processing as number along with
data manipulation techniques as spreadsheets and
relational databases.
PANDAS DATA TYPES
• Object - it is used for string data type values

• Int64 - it is used for integers that is the numbers without decimals.

• float 64 - it is used for float values that is the real numbers .

• bool - it is used for Boolean values that is true or false.


PANDAS- SERIES
Series is the primary building block of Pandas. It is a one dimensional labelled
array capable of holding data of any type. The data in a series is mutable but the
number of members in a series is immutable.

A series in pandas can be created using series() method. Also any list / dictionary can be converted
In to series using this method.

Creating an empty series


import pandas as pd Series([], dtype: float64)
S=pd.Series()
print(S)

Creating an integer series using a list

import pandas as pd
S=pd.Series([10,12,14,16])
print(S)

Creating a series with the different values


Series() Method
• Creates a series From a scalar value, list, array or dictionary.
• it has a parameter called index, which is optional if not given the
series items are indexed with the value of 0 to n-1 for a base
object with n elements.

Creating series using scalar data types


import pandas as pd
S=pd.Series(7,index=[0,1,2,3,4])
print()
print(S)

Creating a series with the value 7 in all


CREATING SERIES USING LISTS
import pandas as pd
months = ['jan','feb','mar','apr','may']
l=[1,3,6,78,99,78]
S1=pd.Series(months)
S2=pd.Series(l)
print(S1)
print(S2)
CREATING SERIES USING ARRAYS: numpy
import pandas as pd
import numpy as np
l=[1,2,4,8,98]
arr=np.array(l)
S=pd.Series(arr) #without index parameter
print ("Array is:",arr)
print("Series is:")
print(S)
CREATING A SERIES FROM DICTIONARY
What is a dictionary-Dictionaries in Python are list of key value pairs.
Example- dict = {'A' : 10, 'B' : 20, 'C' : 30}

# Code to create series using dictionaries without index parameter


INDEX
import pandas as pd

dict = {'A' : 10, 'B' : 20, 'C' : 30}

S=pd.Series (dict)

print(S)

KEYS of the DICTIONARY will become index for the SERIES


CREATING A SERIES FROM DICTIONARY

# Creating Series with index parameter


import pandas as pd

dict = {'A' : 10, 'B' : 20, 'C' : 30}

S=pd.Series(dict, index=['B','C', 'A']) INDEX

print()

print(S)
'''WAP TO OBTAIN 5 NUMBER FROM THE
USER IN A LIST AND TEN CREATE A SEIRES
FROM IT AND THEN DISPLAY THE SUM AND
AVERAGE OF ALL THE SERIES ELEMENTS.'''

import pandas as pd
ls=[]
for i in range(1,6):
n=int(input("enter elements "))
ls.append(n)
print("list ")
print(ls)
sr1=pd.Series(ls)
print("SERIES ")
print(sr1)
l=len(sr1)
s=0
av=0
for i in range(0,l):
s=s+sr1[i]
av=s/l
print("sum of seires ",s)
print("average of series ",av)
ACCESSING SERIES INDEX AND VALUES
Index is used to display index values
Values shows the values in a series.

#Code to show the use of index and values


import pandas as pd
dict = {'A' : 10, 'B' : 20, 'C' : 30}
S=pd.Series (dict)
print()
print(S)
print(S.index)
print(S.values)
WAP TO CREATE A SERIES FROM LIST OF
MARKS AND TAKING INDEX VALUES FROM
LIST RNO. SHOW ALL ELEMENTS THAT
ARE ABOVE 75 MARKS
SERIES :
MCQ QUESTIONS
import pandas as pd MATHEMATICAL OPERATIONS
l=[10,20,33,40,53,60] #creation of list ON SERIES

sr1=pd.Series(l) #creation of series using a list #performing addition

print(sr1) import pandas as pd

#different operations on series S=pd.Series(7,index=[0,1,2,3,4])

#add 3 to all the elements of the series print()

print(sr1+3) print(S+2)

#find the remainder of the series element when dividing by 5 #Performing subtraction

print(sr1%5) import pandas as pd

#display all the element of the series which are above 30 S=pd.Series(7,index=[0,1,2,3,4])

print(sr1[sr1>30]) print()

#display all the even elements print(S-2)

print(sr1[sr1%2==0])
#display the square root of all the elements of the series

import math
for i in sr1:
print(math.sqrt(i))
Arun 40000
#wap to create a series object that stores the budget allocated
Kedar 50000 index 1 for 4 quarters of the year(q1,q2,q3,q4)
Suman 45000 #1 code to modify the budget of q1 and q3 as 1 lakh RS
Sheetal 25000 #2 code to modify Q3 onwards with 250000
1.Write command to display name #3 add Q5 with value 250000
whose sales is > 35000. #4 display first2 rows and last 3 rows of the series
2.Command to rename sr1 to s1) #5 display the data more than 500000
#6 arrange the series in ascending order of values
3.Change the values of 2nd and 3rd row #7 arrange the series in descending order of values
to 80000
import pandas as pd
import pandas as pd l=[50000,75000,80000,65000]
bg=pd.Series(l,index=["Q1","Q2","Q3","Q4"])
l=[40000,50000,45000,25000] print(bg)
#1
sr1=pd.Series(l,index=["arun","kedar","suman","sheetal"])
bg["Q1"]=100000
print(sr1) bg["Q3"]=100000
print(bg)
print(sr1[sr1>35000]) #2
print(sr1.rename("s1")) bg["Q3":]=25000
print(bg)
sr1[1:3]=80000 #3
bg["Q5"]=40000
print(sr1) print(bg)
#4
print(bg.head(2))
print(bg.tail(3))
#5
print(bg[bg>50000])
#6
bg.sort_values()
print(bg)
#7
print(bg.sort_values(ascending=False))
Accessing rows using head() and tail() functions
head () function returns first 5 elements of the series
tail() function returns last 5 elements of the list by default
import pandas as pd
Months=['Jan','Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
S1=pd.Series(Months)
print(S1.head())
print (S1.tail())
Accessing rows using head() and tail() functions

import pandas as pd
Months=['Jan','Feb', 'Mar', 'Apr', 'May', 'Jun', 'Jul', 'Aug', 'Sep', 'Oct', 'Nov', 'Dec']
S1=pd.Series(Months)
print(S1.head(2))
print (S1.tail(3))
A python list stores class xii sections and
create a series of 6 even numbers and then -
Another list rupees stores contribution Done by them
#find the sum average of this series for corona fund. Write Code to create 2 series that
stores the contribution as values and section as
#also find minimum & maximum value of this series
index.
#calculate total elements of the series

import pandas as pd import pandas as pd

l=[2,10,20,30,40,50] sec=["A","B","C","D"]

sr1=pd.Series(l) rs=[12000,10000,15000,22000]

print("series of even numbers ") sr1=pd.Series(rs,index=sec)

print(sr1) print(sr1)
''' Now school has decided to donate as much
m=sr1.sum()
amount as made by each section. write code to
print("sum of the seires ",m)
create a series that stores the contribution
av=sr1.mean() # average
of school.'''
print("average of the series ",av)
import numpy as np
print("minimum value ",sr1.min())
rs1=np.array([12000,10000,15000,22000])
print("maximum value ",sr1.max())
sch=pd.Series(data = rs1*2,index=sec)
print("total elements of the series ", sr1.count())
print(sch)
'''consider a series sr3 that stores the
number of students in each section. import pandas as pd
write code to display first two sections l=[10,20.3,30,40,50] list
that have been assign the task for sr1=pd.Series(l) series
selling tickets @100/- for carnival. ''' print(sr1)
import pandas as pd #series - sr1, list l will be give then what will be the output
sec=["A","B","C","D"] print(sr1+2) #each element of series will be added by 2
num=[42,50,45,32] print(l+2) #error
sr1=pd.Series(num,index=sec) print(sr1*2) #each element of series multiply by 2
print(sr1) print(l*2) #list duplicate 2 times
print("AMOUNT COLLECTED ")
print(sr1[:2]*100)

'''write the code to modify the


strength of section A and C to 45
and 48 respectively and then display
the changed series'''

print("OLD series ")


print(sr1)
sr1[0]=45
sr1[2]=58
print("changed series ")
print(sr1)
create a series that store cost of any 6 products in rupees.
QUES -1 write code to find out the maximum and minimum 3 costs from the series.

import pandas as pd #another way

l=[300,150,450,600,250,800,1000] print("top 3 ")


sr1=pd.Series(l) print(sr1.sort_values(ascending=False).head(3))
print(sr1) print("smallest 3 ")
print("top 3 cost ") print(sr1.sort_values(ascending=False).tail(3))'''
print(sr1.sort_values().tail(3))
print("smallest 3 ")
print(sr1.sort_values().head(3))
Q-2 find out the cost of the products having cost more than 400RS
print("cost > 400 RS")
print(sr1[sr1>400])
write a code to sort the series object in descending order of their indexes and store it into a series k3
k3=sr1.sort_index(ascending=False)
print(k3)

write code to change the values at 2nd row(index 1) and 3rd row(index 2) to 950 rs
sr1[1:3]=950
print("change series")
print(sr1)
SERIES ATTRIBUTES
print("data type ",sr1.dtype)#data type of the seires import pandas as pd
l=[300,150,450,600,250,800,1000]
print("shape ",sr1.shape)#shape of the series in form tuple
sr1=pd.Series(l)
print("dimension ",sr1.ndim)#number of dimension
print(sr1)
print("size ",sr1.size)#number of elements in the series
print("index ",sr1.index)#show the axis label
print("values ",sr1.values)#display values of the seires
print("no of bytes ",sr1.nbytes)#returns number of bytes (int64 and float64= 8 bytes)
print("hasnans ",sr1.hasnans)#gives true if series has any NaN value
print("empty ",sr1.empty)#gives ture is series is empty
print("item size ",sr1[1].itemsize)#return size of each item'''
data type int64
shape (7,)
dimension 1
size 7
index RangeIndex(start=0, stop=7, step=1)
values [ 300 , 150 , 450 , 600 , 250 , 800 ,1000]
no of bytes 56
hasnans False
empty False
item size 8
Q-1 Given the following series sd write command to perform the following tasks:

import pandas as pd 1. Show only element above or equal to30 print(sd[sd>=30])


l=[10,35,50,40,63] 2. Display all even elements of the series print(sd[sd%2==0])
sd=pd.Series(l) 3. Display element which are multiple of 7 print(sd[sd%7==0])
Q-2 FIND THE OUTPUT
0 2 0 32.0
import pandas as pd 1 4 1 76.0
l=[10,35] 2 6 Q-3 FIND THE OUTPUT 2 100.0
dtype: int64 3 70.0
m=[2, 4, 6, 8,10,12] 0 2 import pandas as pd
4 75.0
sd1=pd.Series(l) 2 6 l=[10,35,50,40,63] 5 NaN
4 10 6 NaN
sd2=pd.Series(m) dtype: int64 m=[22,41,50,30,12,200,874]
dtype: float64
print(sd2.loc[:2]) Series([], dtype: sd1=pd.Series(l) 0 32.0
int64) 1 76.0
print(sd2.loc[0:4:2]) 0 2 sd2=pd.Series(m)
2 100.0
print(sd1.loc[4:5]) 1 4 print(sd1+sd2) 3 70.0
dtype: int64 4 75.0
print(sd2.iloc[:2]) 0 10 print(sd2.add(sd1,fill_value=0))
5 200.0
print(sd1.iloc[0:4:2]) dtype: int64 6 874.0
4 10 dtype: float64
print(sd2.iloc[4:5]) dtype: int64
Write a program to generate a Write a program to generate a series using a dictionary to
series of the first 10 numbers. represent month number and month names.
import pandas as pd import pandas as pd
di={1:'January',2:'February',3:'March',4:'April',5:'May',6:'June',7:'July
s = pd.Series(range(1,11))
',8:'August',9:'September',10:'October',11:'November',12:'December'
print(s) }
s = pd.Series(di)
Write a program to generate a print(s)
series of float numbers from 21.0 to
30.0 with an increment of 1.5 each.
Write a program to generate a series of marks of 5
students. Give grace marks up to 5 of those who are
import pandas as pd having <33 marks and print the new list of the marks
import numpy as np import pandas as pd
n = np.arange(21,32,1.5)
std_marks = []
s = pd.Series(n) for i in range(1,6):
print(s)
m = int(input("Enter the marks:"))
std_marks.append(m)
Write a program to generate a
s = pd.Series(index=range(1201,1206),data=std_marks)
series of 5 elements of multiples of
s[s<33]=s+5
7 starting with 35 with index
print("New List is:")
multiply by 3.
print(s)
import pandas as pd
import numpy as np Write a program to generate a series of 10 numbers with
a = 35 scalar value 33.
n = np.arange(a,a*2,7) import pandas as pd
s = pd.Series(index=n*3,data=n) print(pd.Series(33,range(1,11)))
print(s)
Write a program to generate a series of Write a program to generate a series of 10 numbers
these numbers: 33,55,65,29,19,23. Find starting with 41 and with the increment of 3. Now
the sum of those values which ends with add 7 all odd values and reprint the updated series.
3 or 5.
import pandas as pd import pandas as pd
list = [33,55,65,29,19,23] s = pd.Series(range(41,71,3))
ser = pd.Series(list) print(s)
val_sum = 0 s[s % 2 != 0] = s + 7
sum5 = sum(ser[ser % 10 == 5]) print(s)
sum3 = sum(ser[ser % 10 == 3])
print(sum3+sum5)
Write a program to generate a series and print
Write a program to generate a series of 10
numbers. Change the value of all the the top 3 elements using the head function.
elements those values are multiples of 4. import pandas as pd
import pandas as pd ser_length = int(input("Enter the length of the series: "))
numbers = [] data = [ ]
for i in range(1,6): for i in range(ser_length):
val = int(input("Enter a number :")) val = int(input("Enter a val:"))
numbers.append(val) data.append(val)
ser = pd.Series(numbers) ser = pd.Series(data)
print(ser) print(ser.head(3))
ser[ser % 4 == 0] = 21
print(ser)
#create as series of 10 integers .
import pandas as pd
l=[2,4,6,8,10,13,15,11,18,19]
sr1=pd.Series(l)
print(sr1)
print("sum is ",sr1.sum())
#display sum of the series print("max is ",sr1.max())
#display max. and min. element of the series print("min is ",sr1.min())
#display average of the series print("average is ",sr1.mean())
#display total number of elements of the series print("total is ",sr1.count())
#display the elements which are above 5 print("elements above 5 ", sr1[sr1>5])
#display all the even elements of this series print("even elements ",sr1[sr1%2==0])
#display first 3 elements of the series print("first 3 elemets ", sr1.head(3))
#display last 4 elements print("last 4 elemets ", sr1.tail(4))
#display elements present from 1 to 4th index print("elemets are ",sr1[1:5])
#change the values of the element present at 3rd sr1[3]=1000
index to 1000. print("changed series ",sr1)
Dataframes : it is a two
dimensional structure
storing multiple date of
mutable type.
CREATION OF DATAFRAME USING A DICTIONARY Given a dictionary that stores the section
WHERE VALUES OF THIS DICT. ARE STORED IN A LIST Names list as a value for “sections” keys and
by default index starts from 0 Their contribution amount list as values .
Create dataframe for this dictionary.
rno Name Marks
1 Amit 70 import pandas as pd
dc={"section":["A","B","C","D"],
2 Rahul 90
"amount":[10000,15000,20000,14000] }
3 Gagan 80 df1=pd.DataFrame(dc)
4 Punit 70 print(df1)
5 Gaurav 90
import pandas as pd
dc={"rno":[1,2,3,4,5],
"name":["amit","rahul","gagan","punit","gaurav"],
"marks":[770,80,90,70,90] }
df1=pd.DataFrame(dc,index=["a","b","c","d","e"])
print(df1)

Creating a dataframe from nested list:


import pandas as pd
l=[ ["blue",100,"amit"],["red",150,"gagan"],["yellow",120,"lalit"]]
df1=pd.DataFrame(l,columns=["house_name","points","head_name"])
print(df1)
WAP TO CREATE A CREATE A DF FROM A
CREATION OF DF FROM A SERIES
LIST CONTAINING 2 LISTS, EACH
HAVING TARGET AND ACTUAL SALES OF import pandas as pd
FOUR ZONAL OFFICES. GIVE APPROPRIATE
target=[100000,50000,70000,90000]
ROW LABELS.
sales=[85000,45000,68000,89000]
import pandas as pd sr1=pd.Series(target)
target=[100000,50000,70000,90000] sr2=pd.Series(sales)
sales=[85000,45000,68000,89000] zonal={"target":sr1,"sales":sr2}
zonal=[target,sales] df1=pd.DataFrame(zonal,index=["z1","z2","z3","z4"])
df1=pd.DataFrame(zonal,columns=["Z1","Z2","Z3","Z4"],index=["target","sales"]) print(df1)
print(df1)
[same ques can be design using dictionary]

Consider two series object staff and salary that stores the number of
people in various office And total amount of salaries given in these
branches respectively. WAP to create another series object that stores
average salary per branch and then create a DF object from From these
series.

import pandas as pd
staff=pd.Series([50,60,30,40])
salary=pd.Series([150000,100000,60000,80000])
avgsal=salary/staff
branch={"person":staff,"rupees":salary,"average":avgsal}
df1=pd.DataFrame(branch)
print(df1)
ATTRIBUTES OF DATAFARME
marketing Sales
import pandas as pd
Age 20 25
d1={"marketing":[20,"neeraj","male"],"sales":[25,"kavita","female"]}
Name Neeraj Kavita df1=pd.DataFrame(d1,index=["age","name","gender"])
Gender male female print(df1)
print(df1.index)
Attributes of datafarme
•Index – gives index (row-label) of the df print(df1.columns)
•Columns – gives column labels of the df. print(df1.axes)
•Axes – returns a list showing both the axes(row&col) print(df1.dtypes)
•Dtypes-gives the data type in the df. print(df1.size)
•Size – gives number of elements of the df print(df1.shape)
•Shape – return tuple giving its dimension
print(df1.ndim)
•Ndim = gives number of axes
•T – transpose (swapping rows and columns) print(df1.T)
•Values=show values of df in NUMPY format print(df1.values)
SELECTING COLUMNS AND ROWS OF THE DATAFRAME
DATA FRAME – DF1 USING LOC (where start and end both are included in the range)
to access a row of the df - dataframe.loc[row label,:]
population Hospitals Schools to access multiple rows - dataframe.loc[start row : end row ,:]
Delhi 1230456 189 7845 to access selective columns only - dataframe.loc[:,start col:end col]
to access specific row and columns
Mumbai 1122233 200 8542 USING ILOC(here ned value is include)
Kolkata 1025465 154 69874 1. display all columns/information of delhi.
2. display information from delhi to kotkata
Chennai 9875461 132 5985 3. display all columns using loc
TO SELECT COLUMNS OF THE DF - 1. display first 2 rows of the df
2. display first 2 rows and 2 columns of the df.
dataframe name["col.name"]
dataframe name.column name import pandas as pd
d={"population":[19875462,10254879,4562325,84579654],
TO SELECT MORE THAN ONE COLUMN.
"hospital":[189,205,125,135],
dataframe[["colname","col name"]] "schools":[7895,65847,1254,5689]
1. to show schools of the df }
2. display population and hospitals df1=pd.DataFrame(d,index=["delhi","mumbai","kolkata","chennai"])
3. display hospital and population print(df1)
print(df1.loc["delhi",:])
print(df1.loc["delhi":"kolkata",:])
print(df1.loc[:,"population":"schools"])
import pandas as pd print(df1.loc["delhi":"kolkata","population":"schools"])
d={"population":[19875462,10254879,4562325,84579654], print(df1.iloc[0:2])
"hospital":[189,205,125,135], print(df1.iloc[0:2,0:2])
"schools":[7895,65847,1254,5689]
}
df1=pd.DataFrame(d,index=["delhi","mumbai","kolkata","chennai"])
print(df1)
print(df1["schools"])
print(df1.schools)
print(df1[["population","hospital"]])
print(df1[["hospital","population"]])
GIVE A DATAFRAME ST1 WITH COLUMNS : RNO , NAME, UT1,HY, TERM1,FINAL FROM 10 STUDENTS.
1.Display row 2nd to 5th both included.
2.From row 2 to 4 display name and final marks only.
3.From row 2 to 4 display first 3 columns.
4.Display name and term1 marks of all the students.
5.Display the information of first students of the dataframe.
1.print(st1.iloc[2:6])
2.print(st1.loc[2:5,[“name”,”final”])
Adding columns and rows to an existing dataframe :

'''
ADDING A COLUMN TO A DF-
1. df[col name]=[new values]
ques -1 Add new col. in the df with the name parks.
ques -2 Add a ne col. grade with all values as A
ADDING A ROW TO A DF-
1. dfname.loc[row label]=data
'''
import pandas as pd
d={"population":[19875462,10254879,4562325,84579654],
"hospital":[189,205,125,135],
"schools":[7895,65847,1254,5689]
}
df1=pd.DataFrame(d,index=["delhi","mumbai","kolkata","chennai"])
print(df1)
df1["parks"]=[203,123,345,223]
print(df1)
df1["grade"]="A"
print(df1)
df1.loc["up"]=[25658755,504,419,458,"B"]
print(df1)
import pandas as pd Default column name as 0
a1=[20,30,40,50,60]
df = pd.DataFrame(a1)
print(df)

Changing column name


import pandas as pd
a1=[20,30,40,50,60]
df = pd.DataFrame(a1)
print(df)
df.columns=['AGE']
print(df)
UPDATING DATAFRAME : ADDING COLUMNS TO DATAFRAME

import pandas as pd
a1=[20,30,40,50,60]
df = pd.DataFrame(a1)
df.columns=['AGE']
print(df)
df['age2']=45
print(df)
df['age3']=pd.Series([22,33,44,55,66])
print(df)
df['total']=df['AGE']+df['age2']+df['age3']
print(df)

Write the code in python to create a dataframe.


Having product code , name and cost of 4 products.
Then add 2 new columns containing discount amount
And net amount. Discount percentage is 3.5%.
WRITE THE CODE IN PYTHON TO CREATE A DATAFRAME. HAVING PRODUCT CODE , NAME AND
COST OF 4 PRODUCTS. THEN ADD 2 NEW COLUMNS CONTAINING DISCOUNT AMOUNT
AND NET AMOUNT. DISCOUNT PERCENTAGE IS 3.5%.

import pandas as pd
import pandas as pd
pr=[{"pcode":101,"pname":"laptop ","cost":45000}, d={"pno":[101,102,103],
"pname":["mobile","pen","mouse"],
{"pcode":102,"pname":"mobile ","cost":25000}, “cost”:[45000,25000,15000],
{"pcode":103,"pname":"printer","cost":15000}, }
df1=pd.DataFrame(d)
{"pcode":104,"pname":"scanner","cost":5000}] print(df1)

df1=pd.DataFrame(pr)
df1["discount_amt"]=df1["cost"]*3.5/100
df1["net_amount"]=df1["cost"]-df1["discount_amt"]
print("DATA FRAME FOR PRODUCTS :- ")
print(df1)
'''create a data frame for the result analysis if 5 students of your class.
rno,name,ut1,ut2,hy marks.
1. add a new column total which add ut1,ut2 and hy marks.
2 display min values of all the columns '
3.display datatype of each column.
'''

import pandas as pd
d={"rno":[11,12,13,14,15],
"name":["anil","amit","rahul","punit","lalit"],
"ut1":[20,15,13,15,20],
"ut2":[20,25,20,15,20],
"hy":[50,45,55,60,70] }
df1=pd.DataFrame(d)
print(df1)
df1["total"]=df1["ut1"]+df1["ut2"]+df1["hy"]
print(df1)
print("minimum values of each column ")
print(df1.min())
print("datatype of each column ")
print(df1.dtypes)
DISPLAYING PARTICULAR COLUMN

import pandas as pd
a1=[20,30,40,50,60]
df = pd.DataFrame(a1)
df.columns=['AGE']
df['age2']=45
df['age3']=pd.Series([22,33,44,55,66])
df['total']=df['AGE']+df['age2']+df['age3']
print(df)
print(df['total'])
DELETING PARTICULAR COLUMN

import pandas as pd
a1=[20,30,40,50,60]
df = pd.DataFrame(a1)
df.columns=['AGE']
df['age2']=45
df['age3']=pd.Series([22,33,44,55,66])
df['total']=df['AGE']+df['age2']+df['age3']
print(df)
del df['total']
print(df)
df.pop('age3')
print(df)
import pandas as pd
pr=[{"pcode":101,"pname":"laptop ","cost":45000},
{"pcode":102,"pname":"mobile ","cost":25000},
{"pcode":103,"pname":"printer","cost":15000},
{"pcode":104,"pname":"scanner","cost":5000}]
df1=pd.DataFrame(pr)
df1["discount_amt"]=df1["cost"]*3.5/100
df1["net_amount"]=df1["cost"]-df1["discount_amt"]
print("DATA FRAME FOR PRODUCTS :- ")
print(df1)
r=df1.pop("discount_amt")
print(df1)
print("new data frame of removed col : discount amount")
print(r)
del df1["net_amount"]
print(df1)
df1["discount"]=3.5
print(df1)
ndf1=df1.drop("discount",axis=1)#axis = 1 columns wise
print("new df without discount col")
print(ndf1)
print("old df ")
print(df1)
ndf2=df1.drop(index=[2],axis=0)#axis = 0 row wsie
print("new df without printer / row with index 2")
print(ndf2)
print("old df ")
print(df1)
NOW WE ARE
WORKING ON
DATAFRAME DF1
DATAFRAME DF1 FOR REFERENCE
DATAFRAME DF1 FOR REFERENCE
CREATING DATATFRAMES USING DICTIONARY
import pandas as pd
dic1={'name':['raj','neeraj','manav','kapil','punit'],
'age‘ : [23,30,40,25,50],
'amount':[10000,50000,20000,30000,40000]
}
sdf=pd.DataFrame(dic1,index =[1,2,3,4,5],columns=['name','age','amount'])
print(sdf)
USING VARIOUS AGGREGATE FUNCTION ON DATAFRAME
import pandas as pd
dic1={'name':['raj','neeraj','manav','kapil','punit'],
'age':[23,30,40,25,50],
'amount':[10000,50000,20000,30000,40000]
}
sdf=pd.DataFrame(dic1,index =[1,2,3,4,5],columns=['name','age','amount'])
print(sdf)
print("max ", sdf.max())
print("min ", sdf.min())
print("count ", sdf.count())
print("mean ", sdf.mean())

THIS CODE IS GENERATING OUTPUT ON


EACH AND EVERY COLUMN
AND ON STRING COLUMN OUTPUT IS
PRODUCED ON THE BASIS ON
ASCII CODES
A- Z = 65 to 90
a – z = 97 to 122
0 – 9 = 48 to 57
import pandas as pd
dic1={'name':['raj','neeraj','manav','kapil','punit'],
'age':[23,30,40,25,50],
'amount':[10000,50000,20000,30000,40000]
}
sdf=pd.DataFrame(dic1,index =[1,2,3,4,5],columns=['name','age','amount'])
print(sdf)
print("counting column in each row")
print("count ", sdf.count(axis=1))# axis =1 for row ingnores NA values
print("counting values in each column")
print("count ", sdf.count(axis=0))# axis = 0 for col

COUNT() : FOR COUNTING


NUMBER OF RECORDS
PRESENT IN THE COLUMNS
AGGREGATE FUNCTIONS ON PARTICULAR COLUMNS

import pandas as pd
dic1={'name':['raj','neeraj','manav','kapil','punit'],
'age':[23,30,40,25,50],
'amount':[10000,50000,20000,30000,40000]
}
sdf=pd.DataFrame(dic1,index =[1,2,3,4,5],columns=['name','age','amount'])
print(sdf)
print(sdf['name'].max())
print(sdf['age'].min())
print(sdf['amount'].median())
For median : first arrange data in asc order:-

10000,20000,30000,40000,50000

10000,20000,30000,40000,50000,60000

(30000+40000)/2=
Median : 35000
''' WRITE PYTHONE TO CREATE THE FOLLOWING DATA FRAME:
RNO NAME UT1 UT2 HALF_YEALRY 1. WRITE COMMAND TO COMPUTE SUM OF EVERY COLUMN.
1 AMIT 20 22 85 2. WRITE COMMAND TO COMPUTE MEAN OF COLUMN UT2.
2 LALIT 15 16 50 3. WRITE COMMAND TO FIND SUM OF EVERY ROW OF THE DATAFRAME.
3 PUNIT 14 18 60 4. WRITE COMMAND TO COMPUTE AVERAGE OF ALL THE COLUMNS FOR LAST 3
4 KAPIL. 18 20 40 ROWS.
5. WRITE COMMAND TO COMPUTE AVERAGE OF UT1 AND UT2 FOR FIRST 3 ROWS.'‘’
5 RAJ 19 22 78

rno name ut1 ut2 half_yearly


import pandas as pd 0 1 amit 20 22 85
1 2 lalit 15 16 50
dic1={"rno":[1,2,3,4,5], 2 3 punit 14 18 60
"name":["amit","lalit","punit","kapil","raj"], 3 4 kapil 18 20 40
"ut1":[20,15,14,18,19], 4 5 raj 19 22 78
rno 15
"ut2":[22,16,18,20,22], name amitlalitpunitkapilraj
"half_yearly":[85,50,60,40,78]} ut1 86
df1=pd.DataFrame(dic1) ut2 98
half_yearly 313
print(df1) dtype: object
#Q-1 19.6
print(df1.sum()) 0 128
1 83
#Q-2 2 95
print(df1["ut2"].mean()) 3 82
#Q-3 4 124
dtype: int64
print(df1.sum(axis=1)) rno 5.0
#Q-4 ut1 19.0
print(df1.loc[4:].mean()) ut2 22.0
half_yearly 78.0
#Q-5 dtype: float64
print(df1.loc[:3,"ut1":"ut2"].mean()) ut1 16.75
ut2 19.00dtype: float64
DATAFRAME DF2 FOR REFERENCE
import pandas as pd
dic1={'name':['raj','neeraj','manav','kapil','punit','tarun'],
'age':[23,30,40,25,50,100],
'product':['lcd','mobile','laptop','speaker','ipad','printer'],
'amount':[10000,50000,20000,30000,40000,50000]
}
sdf=pd.DataFrame(dic1,index =[1,2,3,4,5,6],columns=['name','age','product','amount'])
print(sdf)
print()
pv=sdf.pivot(index = 'name',columns = 'product',values = 'amount')
print(pv)
A new pivot table (pv)
has been created
On the basis of amount
column. And amount
gets displayed in
various
Columns and where
there are no values are
found matching , NAN
gets displayed
import pandas as pd
dic1={'name':['raj','neeraj','manav','kapil','punit','tarun'],
'age':[23,30,40,25,50,100],
'product':['lcd','mobile','laptop','speaker','ipad','printer'],
'amount':[10000,50000,20000,30000,40000,50000]
}
sdf=pd.DataFrame(dic1,index =[1,2,3,4,5,6],columns=['name','age','product','amount'])
print(sdf)
print()
pv=sdf.pivot(index = 'name',columns = 'product',values = 'amount').fillna(' ')
print(pv)
import pandas as pd
dic1={'name':['raj','neeraj','manav','kapil','punit','tarun'],
'age':[23,30,40,25,50,100],
'product':['lcd','mobile','laptop','speaker','ipad','printer'],
'amount':[10000,50000,20000,30000,40000,50000]
}
sdf=pd.DataFrame(dic1,index =[1,2,3,4,5,6],columns=['name','age','product','amount'])
print(sdf)
print()
pv=sdf.pivot(index = 'name',columns = 'product')
print(pv)
The repetitive dataset can be filtered using fillna() function / can be used on particular column also
import pandas as pd
dic1={'name':['raj','neeraj','manav','kapil','punit','tarun'], print(pv.age.fillna(''))
'age':[23,30,40,25,50,100],
'product':['lcd','mobile','laptop','speaker','ipad','printer'],
'amount':[10000,50000,20000,30000,40000,50000]
}
sdf=pd.DataFrame(dic1,index =[1,2,3,4,5,6],columns=['name','age','product','amount'])
print(sdf)
print()
pv=sdf.pivot(index = 'name',columns = 'product')
print(pv.fillna(''))
import pandas as pd
dic1={'item':['pen','pencil','book','pen','pencil'],
'price':[200,100,500,250,50],
'qty':[10,20,5,30,40]
}
sdf=pd.DataFrame(dic1,index =[1,2,3,4,5],columns=['item','price','qty'])
print(sdf)
print(pd.pivot_table(sdf,index=['item'],aggfunc='sum'))

Pivot table created summing up the


columns
Price & qty on the basis of item name.
import pandas as pd
dic1={'item':['pen','pencil','book','pen','pencil'],
'price':[200,100,500,250,50],
'qty':[10,20,5,30,40]
}
sdf=pd.DataFrame(dic1,index =[1,2,3,4,5],columns=['item','price','qty'])
print(sdf)
print(pd.pivot_table(sdf,index=['item'],values=['qty'],aggfunc=['sum','max','min','
count','mean','median']))

USE OF DIFFERENT
AGGREGATE
FUNCTION WITH
PIVOT_TABLE()
Generation of pivaot table from a .csv file created in excel

import pandas as pd

df=pd.read_csv("C:\\Users\\NITIN\\Desktop\\marks.csv",
skiprows=1,names=['RNO','NAME','CLASS','SUBJECT','MARKS'])
print(df)
k=df.pivot_table(df,index=['NAME'],aggfunc='sum')
print(k)
Pandas dataframes provide two useful sort functions:
SORTING EXAMPLE : SORT_VALUES()
import pandas as pd
dic1={'item':['pen','pencil','book','pen','pencil'],
'price':[200,100,500,250,50],
'qty':[10,20,5,30,40]
}
sdf=pd.DataFrame(dic1,index =[1,2,3,4,5],columns=['item','price','qty'])
print(sdf)
new_df = sdf.sort_values(by="price")# ascending order by default
print(new_df)

FOR DESCENDING ORDER:


new_df = sdf.sort_values(by="price", ascending = 0)
SORTING THE DATAFRAME ON MULTIPLE COLUMNS
import pandas as pd
dic1={'item':['pen','pencil','book','pen','pencil'],
'price':[200,100,500,100,50],
'qty':[10,20,5,30,40]
}
sdf=pd.DataFrame(dic1,index =[1,2,3,4,5],columns=['item','price','qty'])
print(sdf)
new_df = sdf.sort_values(by=["price","qty"], ascending=[True,False])
print(new_df)

In the above example first data will be


arranged in ascending
Order of price and for the same set of
values the data
Is arranged in descending order of their
qty.
SORT BY INDEX EXAMPLE :
import pandas as pd
dic1={'item':['pen','pencil','book','pen','pencil'], Index values
'price':[200,100,500,100,50],
'qty':[10,20,5,30,40]
}
sdf=pd.DataFrame(dic1,[1,2,3,4,0],columns=['item','price','qty'])
print(sdf)
new_df = sdf.sort_index()# ascending order by default
print(new_df)

Dataframe is arranged
on the basis of index
number
import pandas as pd
dic1={'item':['pen','pencil','book','pen','pencil'],
'price':[200,100,500,100,50],
'qty':[10,20,5,30,40]
}
sdf=pd.DataFrame(dic1,[1,2,3,4,0],columns=['item','price','qty'])
print(sdf)
new_df = sdf.sort_index(ascending=0)
print(new_df)

Passing ascending =0 as
an argument to
Sort_index() method shall
display the dataframe
In descending order.
import pandas as pd
sr=pd.Series([True,False])
print(sr)
0 True
1 False
dtype: bool

WRITE A PYTHONE CODE WHICH HAS TWO LIST NAMELY SECTION AND CONTRIBUTION FOR A CHARITY
PURPOSE FROM CLASS XII A TO F SECTIPONS.
NOW WRITE A CODE TO CREATE A PANDAS SERIES THAT STORE THE CONTRIBUTION AS THE VALUES AND
THE SECTION AS THE INDEX.

Ques – Find the output :


import pandas as pd
dict = {'A' : 10, 'B' : 20, 'C' : 30}
S=pd.Series (dict)
print()
print(S)
print(S.index)
print(S.values)
# find the output
import pandas as pd
l=[20,33.5,42,57,60]
sd=pd.Series(l)
print(sd//4)

Q1. Write a python code to create a series object Temp1 that stores temperatures of
seven days in it. Take any random seven temperatures.
import pandas as pd
Temp1=pd.Series([34.0,29.2,39.4,37.7,39.4])
print(Temp1)

Q2. Write a python code to create a series object Temp1 (from array ) that stores
temperatures of seven days of week. Its indexes should be „Sunday‟, „Monday‟, . . .
„Saturday‟.

import pandas as pd
n=[34.0,29.2,39.4,37.7,39.4]
Temp1=pd.Series(n, index=[‘sun’,’mon’,’tue’,’wed’,’thur’])
print(Temp1)
Consider the following Series object namely S:
0 0.430271
.1 0.617328
2 -0.265421
3 -0.86113
What will be returned by following statements?
(a) S *100 (b) S>0

Give the output of the following code:


Stationary =[‘pencils’,’notebooks’,’scales’, ‘erasers’]
S1=pd.Series([20,23,45,60], index=Stationery)
S2= pd.Series([30,40,50,70], index=Stationery)
print(S1+S2)
s=S1+S2
print(s+S1)

Find the error and correct:


data=np.array([‘a’,’b’,’c’,’d’,’e’,’f’])
s=pd.Series(index=[100,101,102,103,104,105])
print(s[102,103,104])
Write a program to create a series to store the number of students enrolled in 8 games
during summer camp. The game names are the indexes and the number of students as
the data. The data is stored in a python list.

Write the python code to print the last 2 records of the above created series.

Write a program to create a dictionary to store names and age of 5 students. Use this
dictionary to create a series and print it.
import pandas as pd import pandas as pd
fs=[4,6,8,10] import pandas as np
sf=pd.Series(fs)
dt=np.array(["vi","vii","viii","ix","x"])
ob1=pd.Series(data=fs*2)
sr=pd.Series(dt,index=[1001,1002,1003,1004,1005]) 1002 vii
ob2=pd.Series(data=sf*2)
1004 ix
print(ob1) print(sr[[1002,1004]])
dtype: string
print(ob2) l1=[230,150,450,200,700,350]
1 150
sr1=pd.Series(l1)
3 200
sr2=sr1.sort_values() 0 230
print(sr2) 5 350
sr4=sr1.sort_index(ascending=False) 2 450
print(sr4)
4 700
dtype: int64
sr5=sr1.sort_values(ascending=False).tail(2)
5 350
print(sr5)
4 700
3 200
0 10 2 450
import pandas as pd 1 25 1 150
import numpy as np 2 45 0 230
dt=np.array([10,25,45,80,90,100]) dtype: int32 dtype: int64
s=pd.Series(dt) 3 80 3 200
print(s[:3]) 4 90 1 150
print(s[-3:]) 5 100 dtype: int64
dtype: int32
import pandas as pd
#write a python code to create a dictionary of 5 product where key and value is to be
obtained from the user and then create a series from this dictionary.
#1. display the product with max value
#2. arrange the series in ascending order
#3. arrange series in descending order

d={}
for i in range(1,6):
pcode=int(input("enter code "))
cost=int(input("enter cost "))
d[pcode]=cost
sr=pd.Series(d)
print(sr)
#1
print("maximum ",sr.max())
#2
print(sr.sort_values())
#3
print(sr.sort_values(ascending=False))
#WAP TO GENERATE A SERIES OF MARKS #WAP TO GENERATE SERIES OF 5 ELEMENTS
OF 5 STUDENTS USING A LIST. GIVE OF MULTIPLE OF 7 STARTING FROM 35.AND
GRACE MARKS UP TO 5 OF THOSE WHO INDEX WITH THE MULTIPLE OF 3
ARE HAVING MARKS <33 AND PRINT THE l=[]
VALUES li=[]
r=35
import pandas as pd m=3
mk=[25,55,30,60,90] for i in range(1,6):
l.append(r)
sr=pd.Series(mk)
li.append(m)
print("marks list ") r=r+7
print(sr) m=m+3
sr=pd.Series(l,index=[li])
sr[sr<33]=sr+5
print(sr)
print("new marks list ")
print(sr)
import pandas as pd
s1=pd.Series([12,20,30])
''' create two series s1 and s2 s2=pd.Series([20,30,40,50,60])
contains 3 values and s2 contains 5 sm=s1+s2
values perform addition of s1 and s2 . print(sm)
Do we have NaN values in them , if print(sm.hasnans)# True -it has NaN value in the series sm
yes where (check with hasnans) #1
#1. write code to drop NaN value sm1=s1.add(s2,fill_value=0)
during the addition of these series''' print("NEW SERIES - WITH NaN")
print(sm1)
1. Create a series which contains names of person. import pandas as pd
s=pd.Series(['suman','ravi','swati'])
2. Create a series which contains 5 numbers and alphabets as index.
print(s)
3. Create a series and multiply every element by 2  

4. Create an empty series import pandas as pd


s = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
5. Create a series from dictionary.
 
6. What will be output of the following:
import pandas as pd
s = pd.Series([1,2,3,4,5]) s = pd.Series([1,2,3,4,5])
s.size print(s*2)
7. What will be output of the following:  
s = pd.Series([1,2,3,4,5])
import pandas as pd
s.ndim s=pd.Series()
print(s)
 
import pandas as pd
data = {'a' : 0., 'b' : 1., 'c' : 2.}
s = pd.Series(data)
print(s)
 
5 # s.size gives numbers of elements
1 # ndim gives the dimension of series
import pandas as pd
'''ls=[{"school":"MAPS","strength":3900},
{"school":"DPS","strength":4500},
{"school":"TPS","strength":900}]
df1=pd.DataFrame(ls)
df1.to_csv("schooldata.csv")
print("csv file created")'''

'''write a code to read data from csv file "book.csv" and then create a data frame from it
with the columns bn, name, price. also print the maximum cost of this data frame.'''
'''df1=pd.read_csv("book1.csv", names=["bn","name","price"],skiprows=1)
print(df1)
print("maximum cost ",df1.price.median())#max,min,sum,count,mean(avg),median'''
df1=pd.read_csv("book1.csv",nrows=2)#nrows to fetch specific number of rows from the
csv data
print(df1)
#WAP that reads from csv file("result.csv" having columns rno,name,eng,hindi,maths) then the
program should add a column total storing toal of all three subjects and another column average
storing average marks. print the data frame after adding these columns
dfr=pd.read_csv("result.csv")
print(dfr)
dfr["total"]=dfr["english"]+dfr["hindi"]+dfr["maths"]
dfr["average"]=dfr["total"]/3
print(dfr)
''‘create dataf rame for the following
BNO BNAME RUNS
101 ROHIT 203
102 VIRAT 250 df1["teams"]=["MI","RCB","DC","DC"]
103 SHIKHAR 185 print(df1)
104 RISHABH 175 df1["average"]=df1["runs"]/7
''' print(df1)
#create dataframes
#add a new col teams to mention batsman team in ipl for (rh,rd) in df1.iterrows():
#add a new col avergae (run/inning(each player plaed 7 matches print("row index ",rh)
#print the details of the dataframe row wise (iterrows()) print("row data")
#print the details of the dataframe column wise (iteritems()) print(rd)
#display players name along with their teams
for (ch,cd) in df1.iteritems():
#display first 3 rows of the dataframe throuch iloc method print("column head ",ch)
print("column data ")
import pandas as pd print(cd)
dr={"bno":[101,102,103,104],
print("PLAYERS NAME ")
"bname":["rohit","virat","shikhar","rishabh"], print(df1[["bname","teams"]])
"runs":[203,250,185,175]}
print(df1.iloc[:3])
df1=pd.DataFrame(dr)

print(df1)
create a datafram for storing 4 customer import pandas as pd
information - no,name,phone and address''' d={"cno":[101,102,103,104],
"cname":["manav","naina","simon","shefali"],
#1. display only columns customer name from the dataframe "phone":[789,456,123,546],
#2. modify the dataframe and set index as 11,22,33,44 "address":["delhi","up","mp","mumbai"] }
#3. display top 2 records of the dataframe df1=pd.DataFrame(d)
#4. display customer name and address print(df1)
#5. display last 2 records , with only cname and phone #1
#6 add a new column age in the dataframe print(df1["cname"])
#7 add a new row ith data as : 105,pooja,888,delhi,20 #2
#8 add a new record using append function df1=pd.DataFrame(d,index=[11,22,33,44])
#9 display the information of customer whose age is above 19 years print(df1)
#3
print(df1.head(2))
#4
print(df1[["cname","address"]])
#5
print(df1.tail(2)[["cname","phone"]])
#6
df1["age"]=[18,17,21,18]
print(df1)

#7
df1.loc[55]=[105,"pooja",888,"delhi",20]
print(df1)
#8
df1=df1.append({"cno":106,"cname":"raj","phone":999,"address":"up","age":21},ignore_index=True)
print(df1)
#9
print(df1[df1.age>=19])
create a data frame sd with the columns :
name, marks,trials,passed - for 5 students
1. display total number of row and columns of this DF.#1.
2. to show the info. of all failed studnets nr=len(sd.axes[0])
3. to change info of naina to passed =YES nc=len(sd.axes[1])
4. to show info. for those marks >70 and trials >1 print("rows ",nr)
5. to show name and marks of all students print("cols ",nc)
those marks > 70 and trials>1 #2
6. remove the data for naina print(sd[sd.passed=="no"])
7. add a row for naina #3
8. arrange the Df in ascending order of index sd.loc[1,"passed"]="YES"
9. display infor of pooja and palak print(sd)
10. add a new column age #4
print(sd[(sd.marks>70) & (sd.trials>1)])
#5
import pandas as pd print(sd[["names","marks"]][(sd.marks>70) & (sd.trials>1)])
d={"names":["manav","naina","pooja","kapil","palak"], #6
"marks":[45,78,52,60,90], sd=sd.drop(1)
"trials":[2,3,1,2,1], print(sd)
"passed":["yes","no","yes","no","yes"]} #7
sd=pd.DataFrame(d) sd.loc[1]=["naina",85,2,"yes"]
print(sd) print(sd)
#8
sd=sd.sort_index()
print(sd)
#9
print(sd.loc[[2,4]])
#10
sd["age"]=[23,22,18,19,25]
print(sd)
import pandas as pd
d={"id":[101,102,103,104,105,106,107],
"name":["john","smith","george","lara","k george","jhon","lucy"],
"dept":["ent","orth","card","skin","md","orth","ent"],
"exp":[12,5,10,3,9,10,3]}
dc=pd.DataFrame(d,index=[10,20,30,40,50,60,70])
print(dc)
print(dc.loc[40])
Create dataframe with index 10,20.30…
print(dc.iloc[[3,6]])
print(dc.name)
print(dc[["name","dept"]])
print(dc.head(3))
print(dc.tail(4))
print(dc.loc[[10,20],["name","dept"]])
print(dc.loc[20:60])
print(dc.iloc[1:7:2])
print(dc.iloc[0:7:2])
dc["age"]=[40,55,60,50,70,45,65]
print(dc)
del dc["age"]
print(dc)
dc1=dc.rename(index={10:100,20:200,30:300,40:400,50:500,60:600,70:700})#for
new df/row label
print(dc1)
dc.rename(index={10:100,20:200,30:300,40:400,50:500,60:600,70:700},inplace=Tr
ue)#for same df/row label
print(dc)
dc2=dc.rename(columns = {"id":"did","name":"dname","dept":"deptt","exp":"expr"})
print(dc2)
print(dc.iloc[6:0:-2])
import pandas as pd
d={"ut1":[57,86,92,52,93,98],
"hy":[83,67,78,84,75,79],
"ut2":[49,87,45,55,87,88],
"final":[89,90,66,78,69,96] }
dm=pd.DataFrame(d,index=["sharad","mansi","kanika","ramesh","ankita","pranay"])
print(dm)
#a index name change
dm=dm.rename(index={"sharad":1,"mansi":2,"kanika":3,"ramesh":4,"ankita":5,"pranay":6})
print(dm)
#b col name change
dm=dm.rename(columns={"ut1":"term1","hy":"term2","ut2":"term3","final":"term4"})
print(dm)
#c. col add
dm["ia"]=["A","A","B","A","C","B"]
print(dm)
#d. row add
dm.loc[7]=[49,56,75,58,"B"]
print(dm)
#E first row del
dm=dm.drop(1,axis=0) #k
print(dm) print(dm[["term2","term4"]])
#f col remove #l
dm=dm.drop("term3",axis=1) print(dm.loc[2:5])
print(dm) #m
#h print(dm.loc[2:5,["term1","term2"]])
print(dm.loc[2]) #n
#i print(dm.loc[[3,5],["term1","term4"]])
print(dm["term4"]>50) #o
#j print(dm.head(3))
print(dm.loc[:,"ia"]=="A") #p
print(dm.tail(4))
Write a program in python to create the following dataframe named “emp” storing the details of employees:
ename job salary dojoin department
1001 Scott Manager 90000 2010-10-01 Accounts
1002 Tom Clerk 34000 2010-01-04 Admin
1003 Joy Clerk 32000 2009-03-01 Admin
1004 Sam Salesman 40000 2009-04-01 Sales
1005 Martin Manager 85000 2008-08-05 Sales
1006 Michel Salesman 43000 2008-08-06 Sales
1007 Francis Clerk 30000 2009-10-10 Accounts
Considering the above dataframe answer the following queries by writing appropriate command in python pandas.
a) Add a new column named bonus ( just after salary column) which is 15% of their salary.
b)Add a row with row index 1008 as Robin,Analyst,60000,9000,2011-04-01,Admin.
c) Now change the salary of Francis as 35000.
d) Display the details of Sales and Accounts department .
e) Display the employee name, job and salary for all those employees whose salary lies between 30000 and 50000.
f) Delete a column dojoin permanently.
g) Now plot a bar chart depicting the employee name on x-axis and their corresponding salary on y-axis, with appropriate Graph title, x-axis title, y-axis title, gridlines and color
etc.

# Now change the salary of Francis as 35000.


import pandas as pd
emp.loc[(emp.ename=='Francis'),'salary']=35000
import matplotlib.pyplot as plt
print(emp)
dict1={'ename':['Scott','Tom','Joy','Sam','Martin','Michel','Francis'],
#Display the details of Sales and Accounts department
'job':['Manager','Clerk','Clerk','Salesman','Manager','Salesman','Clerk'],
print(emp.loc[(emp.department=='Accounts') | (emp.department=='Sales')])
'salary':[90000,34000,32000,40000,85000,43000,30000],
# Display the employee name, job and salary for all those employees whose
'dojoin':['2010-10-01','2010-01-04','2009-03-01','2009-04-01','2008-08-
salary lies between 30000 and 50000.
05','2008-08-06','2009-10-10'],
print(emp.loc[(emp.salary>=30000) & (emp.salary<=50000),
'department':['Accounts','Admin','Admin','Sales','Sales','Sales','Accounts']}
['ename','job','salary']])
emp=pd.DataFrame(dict1,index=[1001,1002,1003,1004,1005,1006,1007])
# Delete a column dojoin permanently.
print(emp)
emp.drop('dojoin',axis=1,inplace=True)
# Add a new column named Bonus ( just after salary column) which is 15%
print(emp)
of their salary.
#Now plot a bar chart depicting the employee name on x-axis and their
x=emp['salary']*0.15
corresponding #salary on y-axis, with appropriate Graph title, x-axis title, y-
emp.insert(3,'bonus',value=x)
axis title, #gridlines and color etc.
print(emp)
x=emp['ename']
# Add a row with row index 1008 as Robin,Analyst,60000,2011-04-
y=emp['salary']
01,Admin.
plt.bar(x,y,color='r')
emp.loc[1008]=['Robin','Analyst',60000,9000,'2011-04-01','Admin']
plt.show()
print(emp)
Write a code to access data from the csv file product.csv(pno, pname,color,price,grade)
And then create a data frame from it. Then perform the following tasks on it.
1.Display columns product name and price for all products.
2.Display the details of top 3 and last three records from it.
3. Now change the color of mobile to black.
4.Display the product name, color and grade for all those products whose price lies between 30000 and
50000.
5.Arrange the data frame in descending order of price.
6.Arrange the data frame in the order of index , descending order.
7.Write command to compute mean of column price.
8.Write command to computer sum of every row of the dataframe.
9.Compute average of price of first three rows.

import pandas as pd
df1=pd.read_csv("product.csv")
print(df1)
print(df1[["pname","price"]])
print(df1.head(3))
print(df1.tail(3))
df1.loc[(df1.pname=="mobile"),"color"]="black"
print(df1)
print(df1.loc[(df1.price>=11000)&(df1.price<=25000),["pname","color","price","grade"]])
print(df1.sort_values(by=["price"],ascending = False))
print(df1.sort_index(ascending = False))
print("MEAN of price column ", df1["price"].mean())
print(df1.sum(axis=1))
print(df1.loc[:2,"price"].mean())
import pandas as pd
import numpy as np
dict1={'item':['laptop','mobile','laptop','mouse'],
1. Display the details of all the laptops. 'manf':['dell','samsung','acer','hp'],
2. Display the details of manufacturers 'qty':[5,2,3,10],
with prices. 'price':[45000,12000,30000,1000]
3. Insert a column named ”color” with the }
following values (black, black, grey, stock=pd.DataFrame(dict1,index=['l1','m1','l2','ms1'])
white) without using series method. print(stock)
4. Display the manufacturers and prices of Print(stock[stock.item=='laptop'] )
all phones and laptops. print(stock[["manf","price"]])
5. Insert two rows in the dataframe stock['color']=['black', 'black', 'grey', 'white']
6. Display the details of mobile phones print(stock)
which have price less than 15000. print(stock.loc[(stock.item=="mobile") | (stock.item=="laptop"),["manf","price"]])
7. Remove the column color permanently. stock.loc['m2']=['mobile','oppo',15,15000,'blue']
8. Display the items of Samsung company stock.loc['sp1']=['speaker','bose',4,20000,'red']
in the stock. print(stock)
9. Make the price of laptop by “acer” print( stock.loc[(stock.item=='mobile')& (stock.price <15000)])
company to 21000 stock.pop('color')
10. Delete the column price print(stock)
11. Insert a column named ”rating’ with the stock.loc[(stock.item=='laptop') & (stock.manf=='acer'),"price"]=21000
following values stock.pop('price')
(“A+”,”A”,”B”,”B+”,”A”,”A+”) as the stock.insert(1,column='rating',value=['A+','A','B','B+'])
second column in the dataframe. stock.loc[(stock.item=='laptop')&
12. Increase the price of all A+ laptops by (stock.rating=='A+'),'price']=stock.price+0.05*stock.price
5%. stock.drop(axis=0,labels='l2')
13. . Delete row with index label l2 stock.loc[stock.item=='mobile','price']=stock.price-0.15*stock.price
temporarily.
14. Decrease the price of all mobiles by
15%
g=['M','F','M','M','F']
df.insert(2,'Gender',value=g)
df.loc[106]=['Tanmay',32,'M','Goa','B.Tech']
print(df.dtypes)
print(df.iloc[::2])
print(df.head(1))
print(df.tail(1))
print(df.iloc[:2])
1.Create the above Data frame with index as 101-105. print(df.iloc[-2:])
2.Add a new column Gender to the data frame after age column. print(df.sum())
3Add a new row to the data frame. print(df.count())
4.Show datatypes of all columns. print(df.loc[:104,'Age'].mean())
5.Access alternate rows of the data frame. print(df.sort_values(by=['Gender','Age'],ascending=[True,False]))
6.Use head and tail function to show first and last record of the data frame. df.loc[df.Name=='Anuj',"Address"]='Delhi'
7.Use iloc to print first and last two rows of the data frame. print(df)
8.Print the sum and count of each column.
print(df.loc[(df.Age>=20) & (df.Age<=30),['Name','Age','Address']])
9.Print the average of age of first 4 rows.
10 Sort the records in the data frame, in ascending order of gender and
plt.hist(df.Age,bins=5)
descending order of age. plt.show()
11.Change address of Anuj to Delhi. for (ri,rd) in df.iterrows():
12.Display name, age and address of all employees whose age lies between print("row index",ri)
20 and 30. print("row data")
13.Plot a histogram depicting age of all employees. print(rd)
14.Print details of the Data Frame row wise. for (ch,cd) in df.iteritems():
15.Print details of the Data frame column wise. print("column head", ch)
print("column data")
import pandas as pd
import matplotlib.pyplot as plt print(cd)

dic={'Name':['Jai','Princi','Gaurav','Anuj','Geeku'],
'Age':[32,24,56,45,35],
'Address':['Delhi','Kanpur','Allahbad','Kannauj','Noida']
,'Qualification':['Msc','MA','MCA','Phd','10th']}
df=pd.DataFrame(dic,index=[101,102,103,104,105])
print(df)
ID COACHNAME SPORTS AGE
import pandas as pd
100 Vikas Badminton 27
101 Kapil Basketball 36 sports={"ID":[100,101,102,103,104],"COACHNAME":
102 Seema Karate 34 ["Vikas","Kapil","Seema","Rahul","kalpana"],"SPORTS":
103 Rahul Khokho 29
["badminton","basketball","karate","khokho","basketball"],"AGE":
104 kalpana Basketball 30
[27,36,34,29,30]}
QUE1: Create the dataframe of the above table. df=pd.DataFrame(sports)

QUE2: Display sports available in table. print(df)


print(df["SPORTS"])
CODING:
df=df.drop("AGE",axis=1)
QUE3: DELETE age column.
print(df)
QUE4: Add a new column salary.
df["SALARY"]=[5000,6000,7000,8000,9000]
QUE5: delete the information of index 2 and 3 row wise. print(df)
QUE6: Display names of coaches along with their age. df1=df.drop(index=[2,3],axis=0)

CODING: print(df1)
print(df[["COACHNAME","AGE"]])
QUE7: Display rows with gap of 2.
print(df.loc[::2])
QUE8: display last three rows.
print(df.tail(3))
QUE9: Display coachname whose age is greater than 30.
print(df.loc[df.AGE>30,["COACHNAME"]])
QUE10: Join coachname and age  df["total"]=df["COACHNAME"]+df["SPORTS"]
print(df)
write a python programme to create a following import pandas as pandas
dataframe book storing the details of the library. import matplotlib.pyplot as plt
df={'bno':[1001,2002,3003,4004,5005],
'bname':["flamingo","subhash dey","ncert","vistas","sumit arora"],
bno bname subject price pages 'subject':["english","eco","maths","english","ip"],
1001 flamingo english 50 209 'price':[50,250,100,75,240],
'pages':[209,287,225,150,354]}
2002 subhash dey eco 250 287 book=pd.DataFrame(df)
print(book)
3003 ncert maths 100 225 #1
book.insert(5,'date of issue‘,values=[----------])
4004 vistas english 75 150 print(book)
5005 sumit arora ip 240 354 #2
book.loc[5]=['sandeep garg','bst',345,300]
print(book)
considering the above dataframe answer the following #3
querries book.loc[(book.bname=='subhash dey'),'price']=280
print(book)
1)add a new column named date of issue just after price. #4
2)add a row bno 9009 as sandeep garg,bst,345,300. print(book.loc[book.subject=='english'])
#5
3)change the price of subhash dey as 280 print(book["pages"].max())
4)display the details of english subject. #6
5) display bname of the book having maximum pages . print(book.loc[(book.price>=100)&(book.price<=250),['bname','subject','pages']])
#7
6)display bname,subject,pages of all those books whose x=book['bname']
price lies between 100 and 250. y=book['price']
7)plot a bar graph depicting the bname on x axis and plt.bar(x,y)
plt.show()
their corresponding prices on y axis. #8
8)mention the graph title,x axis title,y axis title and color. plt.title("library",font size=14,color="blue")
  plt.xlabel("bname",font size=14,color="red")
plt.ylabel("price",font size=14,color="red")
plt.show()
Consider the following data frame and do as directed:
import pandas as pd
d={‘Mouse’:[150,200,300,400],
‘Keyboard’:[180,200,190,300],
‘Scanner’:[200,280,330,450]}
df=pd.DataFrame(d,index=[‘Jan’,’Feb’,’March’,’April’])
QUESTION-1
A.Write code to access data of Mouse and Scanner
columns.
B. Write code to access data of the Keyboard column
using dot notation and column name.
C. Write code to access data of scanners using loc[].
D. Write code to access data of all columns where mouse
data is more than 200.
E. Write code to access columns using 0 and 2. Write a python program to export the DataFrame
F. Write code to access data of rows of jan and march for ‘df1’ to a csv file ‘file1.csv’ in the current folder.
scanner and keyboard. import pandas as pd
df1.to_csv(‘file1.csv’)
print(df[[‘Mouse’,’Scanner’]])
Write a python program to export the
print(df.Keybaord) DataFrame ‘df1’ to a csv file ‘file1.csv’ in the
current folder without index of the Dataframe
print(df.loc[:,’Scanner’]) import pandas as pd
df1.to_csv(‘file1.csv’, index=False)
print(df[df[‘Mouse’]>200])
print(df.iloc[:,[0,2]])
print(df.loc[[‘Jan’,’March’],[‘Scanner’,’Keyboard’]])
1.Which of the following method is used to delete row/column 2.To delete a row from dataframe, which of the following
from a dataframe? method is correct? A dataframe is indexed with names.
delete() df.drop(df.row_index=’Jay’)
remove() df.drop(‘Jay’)
discard() df.drop(df.row=’Jay’)
drop() df.drop(df[‘Jay’])
3.Which of the following method is correct for following
dataframe to delete row by specifying index name? 4.When you perform delete operation on dataframe, it
dt= ({‘Year1’:[1200,1220,1500]}, {‘Year2’:1800,1700,1400}) returns a new dataframe always without changing the old
df = pd.DataFrame(dt,index=[‘T1′,’T2’]) dataframe. To avoid this which parameter you will use to
Which of the following is correct to delete T2 index record change the original dataframe?
from dataframe? intact = True
df.drop(df.index[‘T2’]) inplace = True
df.drop(df.idx=’T2′) update = True
df.drop(index=’T2′) replace = True
df.drop(idx=’T2′)
5.Ankita is working on dataframe wihch has 4 rows and 3 6 Consider the datafame given in Que. No. 3. What
columns. She wrote df.drop(df.index[[1,3]]). Which rows will happens when we add this statement df.Year1!=1200?
be deleted from the dataframe? It will delete all rows except 1200 value
It will delete only row with 1200 value
Row 1 and Row 3
Rows between 1 and 3 It will delete keep data with 1200 only
Row 2 None of these
All rows from row no. 1 to index row no. 3
7. Which parameter is used to add in drop() method to delete
columns? 1. drop()
column = n (Where n is column number) 2. df.drop(‘Jay’)
col = n (Where n is column number) 3. df.drop(index=’T2′)
axis = 0 4. inplace = True
axis = 1 5. Delete row 1 and row 3
6.It will delete only row with 1200 value
7.axis = 1
rno name marks 1. Add anew col grade
1 2. Display name of the students
2 3. Display name and marks
4. Remove col grade
3 5. Display mx of marks col
4 6. Create a bar for name and marks.
import pandas as pd
import matplotlib.pyplot as pl
d={"rno":[1,2,3,4],
"name":["amit","tarun","punit","lalit"],
"marks":[45,65,78,90]
}
df1=pd.DataFrame(d)
print(df1)
df1["grade"]=["a","A","b","c"]
print(df1["name"])
print(df1[["name","marks"]])
print(df1)
del df1["grade"]
print(df1)
print("max marks ",df1["marks"].max())
pl.plot(df1["name"],df1["marks"])
pl.xlabel("name of students")
pl.ylabel("marks")
pl.title("student name -marks")
pl.show()

You might also like