Pandas

Pandas
Pandas is an open-source Python Library / Module providing high performance and data
manipulation and Analysis Tool. The word Pandas derived from Panel Data. The pandas concept
developed by WES McKinney in the year 2008.
Data Structures used in Pandas
a) Series: It is a One-Dimensional Labelled Array Capable of Storing / Holding Homogeneous

data of any type (Integer, String, float,.........Python objects etc).
Creating an Series:
Syntax:- varname=pandas.Series(object, index, dtype)
Ex:
import pandas as a 0 10
lst=[10,20,30,40] 1 20
s=a.Series(lst)
print(s) 2 30
3 40
dtype: int64
Creating an Series object with Programmer-defined Index:
Ex:
import pandas as pd Stno 10
lst=[10,"Rossum",34.56] Name Rossum
s=pd.Series(lst,index=["Stno","Name","Marks"])
print(s) Marks 34.56
dtype: object
Creating a Series object from dict:
Ex:
import pandas as pd sub1 Python
d1={"sub1":"Python","sub2":"Java"} sub2 Java
s=pd.Series(d1)
print(s) sub3 Data Science
sub4 ML
dtype: object
Attributes and methods on Series:
 Attribute returns information of object

 Attributes do not modify or manipulate the object
import pandas as pd
a=["sreenu","varshini",2,3,4,5]
b=pd.Series(a)
print(b)
items 0 sreenu
print(b.items) 1 varshini
2 2
3 3
4 4
5 5
dtype: object>
Values Print(b.values) ['sreenu' 'varshini' 2 3 4 5]
Index Print(b.index) RangeIndex(start=0, stop=6, step=1)
Dtype Print(b.dtype) Object
Shape Print(b.shape) (6,)
Size Print(b.size) 6
Array Print(b.array) <PandasArray>
['sreenu', 'varshini', 2, 3, 4, 5]
Length: 6, dtype: object
Ndim Print(b.ndim) 1
Methods:
 A method modify or manipulate an object

 It represents behaviour of an object
import pandas as pd
b=[1,2,3,4,5]
a=pd.Series(b)
print(a)
sum Adding all the values Print(a.sum()) 15
product Multiple all values in each column Print(a.product()) 120
and return product for each column
mean Print(a.mean()) 3.0
Median Print(a.median()) 3.0
count Print(a.count()) 5
describe Print(a.describe()) count 5.000000
mean 3.000000
std 1.581139
min 1.000000
25% 2.000000
50% 3.000000
75% 4.000000
max 5.000000
dtype: float64
Parameters and arguments:
import pandas as pd
a=["apple","mango","grape"]
b=["a","b","c"]
print(pd.Series())
print(pd.Series(a,b)) a apple
b mango
c grape
dtype: object
import pandas as pd apple a
a=["apple","mango","grape"] mango b
b=["a","b","c"] grape c
print(pd.Series(data=b,index=a)) dtype: object
b) DataFrame: DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a

table with rows and columns.
import pandas as pd 0
d1={"sub1":"Python","sub2":"Java"} sub1 Python
s=pd.Series(d1)
a=pd.DataFrame(s) sub2 Java
print(a)
Number of approaches to create DataFrame
Creating an object DataFrame by Using list / tuple:

lst=[10,20,30] 0 10
s=pd.Series(lst)
a=pd.DataFrame(s) 1 20
print(a) 2 30
import pandas as pd 0 1 2 3
lst=[[10,20,30,40],["RS","JS","MCK","TRV"]] 0 10 20 30 40
df=pd.DataFrame(lst)
print(df) 1 RS JS MCK TRV
import pandas as pd 0 1
lst=[[10,'RS'],[20,'JG'],[30,'MCK'],[40,'TRA']] 0 10 RS
df=pd.DataFrame(lst)
print(df) 1 20 JG
2 30 MCK
3 40 TRA
Creating an object DataFrame by Using dict object:
Ex:
import pandas as pd
dictdata={"Names":["Rossum","Gosling","McKinney"],"Subjects":
["Java","C","Pandas"],"Ages":[80,85,55] }
df=pd.DataFrame(dictdata)
print(df)
Names Subjects Ages

0 Rossum Java 80
1 Gosling C 85
2 McKinney Pandas 55
Creating an object DataFrame by Using Series object:
Ex:
sdata=pd.Series([10,20,30,40]) 0 10
df=pd.DataFrame(sdata)
print(df) 1 20
2 30
3 40
Creating an object DataFrame by Using ndarray object:
Ex:
import numpy as np 0 1
import pandas as pd 0 10 60
l1=[[10,60],[20,70],[40,50]]
a=np.array(l1) 1 20 70
df=pd.DataFrame(a) 2 40 50
print(df)
Misc Operations on DataFrame:
Ex:
import pandas as pd First Second
data={"First":[10,20,30,40],"Second":[1.4,1.3,1.5,2.5]} 10 1.4 0
df=pd.DataFrame(data)
print(df) 20 1.3 1
30 1.5 2
40 2.5 3
import pandas as pd First Second Third
data={"First":[10,20,30,40],"Second":[1.4,1.3,1.5,2.5]} 0 10 1.4 11.4
df=pd.DataFrame(data)
df["Third"]=df["First"]+df["Second"] 1 20 1.3 21.3
print(df) 2 30 1.5 31.5
3 40 2.5 42.5
Locate Row
As you can see from the result above, the DataFrame is like a table with rows and columns. Pandas
use the loc attribute to return one or more specified row(s)
import pandas as pd calories 420
data = { duration 50
"calories": [420, 380, 390],
"duration": [50, 40, 45] Name: 0, dtype: int64
}
df = pd.DataFrame(data)
print(df.loc[0])
By using CSV File(Comma Separated Values): CSV files must be saved on some file name with an
extension .csv ( internally treated as excel sheet ). CSV files stores Tabular data (Numbers and text)
in plain text.
Ex:
import pandas as a
df=a.read_csv("C:\king\Book1.csv")
print(df)
Cleaning Empty Cells
Empty Cells: Empty cells can potentially give you a wrong result when you analyze data.
Remove Rows: One way to deal with empty cells is to remove rows that contain empty cells. This is
usually OK, since data sets can be very big, and removing a few rows will not have a big impact on
the result.
import pandas as pd
df = pd.read_csv(' c:\king\data.csv')
df.dropna(inplace = True)
print(df.to_string())
Find length of rows
import pandas as a
print(len(df))
View top rows
import pandas as a It will return top 5 rows automatically

b=df.head()
print(b)
import pandas as a It will return with custom input to get top n no
df=a.read_csv("C:\king\Book1.csv") of rows
b=df.head(n=10)
print(b)
View bottom rows
import pandas as a Bottom 5 rows automatically

b=df.tail()
print(b)
import pandas as a With custom input to get bottom n no of rows
b=df.tail(n=10)
print(b)
Export DataFrame to Csv file
import pandas as a
name=["a","b","c"]
scr=[90,40,80]
b={"name":name,"score":scr}
df=a.DataFrame(b)
c=df.to_csv("C:\king\Book2.csv")#we can give path to save file
print(c)
Inplace parameter using sort
import pandas as a
df=a.read_csv("C:/king/Book1.csv", usecols=["name"], squeeze=True)
b=df.sort_values()
print(b)
Information of file
import pandas as a
df=a.read_csv("C:/king/Book1.csv")
print(df.info() )
Cleaning data
Accesssing the Data of DataFrame
=======================================================
1) DataFrameobj.head(no.of rows)
2) DataFrameobj.tail(no.of rows)
3) DataFrameobj.describe()
4) DataFrameobj.shape
5) DataFrameobj[start:stop:step]
6) DataFrameobj["Col Name"]
7) DataFrameobj[ ["Col Name1","Col Name-2"...."Col Name-n"] ]
8) DataFrameobj[ ["Col Name1","Col Name-2"...."Col Name-n"]] [start:stop:step]
9) DataFrameobj.iterrows()
===================================================
Understabding loc() ----- here start and stop index Included and
Col Names can be used(but not column numbers]
--------------------------------------------------------------------------------------
1) DataFrameobj.loc[row_number]
2) DataFrameobj.loc[row_number,[Col Name,.........] ]
3) DataFrameobj.loc[start:stop:step]
4) DataFrameobj.loc[start:stop:step,["Col Name"] ]
5) DataFrameobj.loc[start:stop:step,["Col Name1", Col Name-2......."] ]
6) DataFrameobj.loc[start:stop:step,"Col Name1" : Col Name-n"]
------------------------------------------------------------------------------------------------------------
Understabding iloc() ----- here start index included and stop index excluded and Col Numbers must
be used(but not column names]
--------------------------------------------------------------------------------------
1) DataFrameobj.iloc[row_number]
2) DataFrameobj.iloc[row_number,Col Number.........]
3) DataFrameobj.iloc[row_number,[Col Number1,Col Number2............] ]
3) DataFrameobj.iloc[row start:row stop, Col Start: Col stop]
4) DataFrameobj.iloc[row start:row stop,Col Number ]
5) DataFrameobj.iloc[ [row number1, row number-2.....] ]
6) DataFrameobj.iloc[ row start: row stop , [Col Number1,Col Number2............] ]
6) DataFrameobj.iloc[ : , [Col Number1,Col Number2............] ]
=======================================================================
Adding Column Name to Data Frame
=======================================================================
1) dataframeobj['new col name']=default value
2) dataframeobj['new col name']=expression
=======================================================================
Removing Column Name from Data Frame
=======================================================================
1)dataframe.drop(columns="col name")
2)dataframe.drop(columns="col name",inplace=True)
=======================================================================
sorting the dataframe data
=======================================================================
1) dataframeobj.sort_values("colname")
2) dataframeobj.sort_values("colname",ascending=False)
3) dataframeobj.sort_values(["colname1","col name2",...col name-n] )

=======================================================================
knowing duplicates in dataframe data
=======================================================================
1) dataframeobj.duplicated()---------------gives boolean result
=======================================================================
Removing duplicates from dataframe data
=======================================================================
1) dataframeobj.drop_duplicates()
2) dataframeobj.drop_duplicates(inplace=True)
=======================================================================
Data Filtering and Conditional Change
=======================================================================
1) dataframeobj.loc[ simple condition]
Ex: df.loc[ df["maths"]>75 ]
2) dataframeobj.loc[ compund condition]
Ex: df.loc[ (df["maths"]>60) & (df["maths]<85) ]
Ex: df.loc[ (df["percent"]>=60) & (df["percent"]<=80),["grade"]]="First" # cond updattion.
Special Case:
3) dataframeobj.loc[simple condition.str.contains(str)]
4) dataframeobj.loc[simple condition.str.startswith(str)]
5) dataframeobj.loc[simple condition.str.endswith(str)]

Pandas

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Pandas

Uploaded by

Copyright:

Available Formats

Pandas

Data Structures used in Pandas

a) Series: It is a One-Dimensional Labelled Array Capable of Storing / Holding Homogeneous

Syntax:- varname=pandas.Series(object, index, dtype)

Attributes and methods on Series:

 Attribute returns information of object

 A method modify or manipulate an object

Parameters and arguments:

b) DataFrame: DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a

Creating an object DataFrame by Using list / tuple:

Names Subjects Ages

Creating an object DataFrame by Using Series object:

Misc Operations on DataFrame:

Cleaning Empty Cells

Find length of rows

View top rows

import pandas as a It will return top 5 rows automatically

View bottom rows

import pandas as a Bottom 5 rows automatically

Export DataFrame to Csv file

Inplace parameter using sort

Accesssing the Data of DataFrame

7) DataFrameobj[ ["Col Name1","Col Name-2"...."Col Name-n"] ]

8) DataFrameobj[ ["Col Name1","Col Name-2"...."Col Name-n"]] [start:stop:step]

Col Names can be used(but not column numbers]

6) DataFrameobj.loc[start:stop:step,"Col Name1" : Col Name-n"]

3) DataFrameobj.iloc[row_number,[Col Number1,Col Number2............] ]

3) DataFrameobj.iloc[row start:row stop, Col Start: Col stop]

4) DataFrameobj.iloc[row start:row stop,Col Number ]

5) DataFrameobj.iloc[ [row number1, row number-2.....] ]

6) DataFrameobj.iloc[ row start: row stop , [Col Number1,Col Number2............] ]

6) DataFrameobj.iloc[ : , [Col Number1,Col Number2............] ]

Adding Column Name to Data Frame

1) dataframeobj['new col name']=default value

2) dataframeobj['new col name']=expression

Removing Column Name from Data Frame

sorting the dataframe data

3) dataframeobj.sort_values(["colname1","col name2",...col name-n] )

knowing duplicates in dataframe data

1) dataframeobj.duplicated()---------------gives boolean result

Removing duplicates from dataframe data

Data Filtering and Conditional Change

1) dataframeobj.loc[ simple condition]

Ex: df.loc[ df["maths"]>75 ]

2) dataframeobj.loc[ compund condition]

Ex: df.loc[ (df["maths"]>60) & (df["maths]<85) ]

Ex: df.loc[ (df["percent"]>=60) & (df["percent"]<=80),["grade"]]="First" # cond updattion.

You might also like