Professional Documents
Culture Documents
Pandas
Pandas
Pandas is an open-source Python Library / Module providing high performance and data
manipulation and Analysis Tool. The word Pandas derived from Panel Data. The pandas concept
developed by WES McKinney in the year 2008.
Creating an Series:
Ex:
import pandas as a 0 10
lst=[10,20,30,40] 1 20
s=a.Series(lst)
print(s) 2 30
3 40
dtype: int64
Creating an Series object with Programmer-defined Index:
Ex:
import pandas as pd Stno 10
lst=[10,"Rossum",34.56] Name Rossum
s=pd.Series(lst,index=["Stno","Name","Marks"])
print(s) Marks 34.56
dtype: object
Creating a Series object from dict:
Ex:
import pandas as pd sub1 Python
d1={"sub1":"Python","sub2":"Java"} sub2 Java
s=pd.Series(d1)
print(s) sub3 Data Science
sub4 ML
dtype: object
import pandas as pd
a=["sreenu","varshini",2,3,4,5]
b=pd.Series(a)
print(b)
items 0 sreenu
print(b.items) 1 varshini
2 2
3 3
4 4
5 5
dtype: object>
Values Print(b.values) ['sreenu' 'varshini' 2 3 4 5]
Index Print(b.index) RangeIndex(start=0, stop=6, step=1)
Dtype Print(b.dtype) Object
Shape Print(b.shape) (6,)
Size Print(b.size) 6
Array Print(b.array) <PandasArray>
['sreenu', 'varshini', 2, 3, 4, 5]
Length: 6, dtype: object
Ndim Print(b.ndim) 1
Methods:
import pandas as pd
b=[1,2,3,4,5]
a=pd.Series(b)
print(a)
sum Adding all the values Print(a.sum()) 15
product Multiple all values in each column Print(a.product()) 120
and return product for each column
mean Print(a.mean()) 3.0
Median Print(a.median()) 3.0
count Print(a.count()) 5
describe Print(a.describe()) count 5.000000
mean 3.000000
std 1.581139
min 1.000000
25% 2.000000
50% 3.000000
75% 4.000000
max 5.000000
dtype: float64
import pandas as pd
a=["apple","mango","grape"]
b=["a","b","c"]
print(pd.Series())
print(pd.Series(a,b)) a apple
b mango
c grape
dtype: object
import pandas as pd apple a
a=["apple","mango","grape"] mango b
b=["a","b","c"] grape c
print(pd.Series(data=b,index=a)) dtype: object
import pandas as pd 0 1 2 3
lst=[[10,20,30,40],["RS","JS","MCK","TRV"]] 0 10 20 30 40
df=pd.DataFrame(lst)
print(df) 1 RS JS MCK TRV
import pandas as pd 0 1
lst=[[10,'RS'],[20,'JG'],[30,'MCK'],[40,'TRA']] 0 10 RS
df=pd.DataFrame(lst)
print(df) 1 20 JG
2 30 MCK
3 40 TRA
Creating an object DataFrame by Using dict object:
Ex:
import pandas as pd
dictdata={"Names":["Rossum","Gosling","McKinney"],"Subjects":
["Java","C","Pandas"],"Ages":[80,85,55] }
df=pd.DataFrame(dictdata)
print(df)
Ex:
import pandas as pd 0
sdata=pd.Series([10,20,30,40]) 0 10
df=pd.DataFrame(sdata)
print(df) 1 20
2 30
3 40
Creating an object DataFrame by Using ndarray object:
Ex:
import numpy as np 0 1
import pandas as pd 0 10 60
l1=[[10,60],[20,70],[40,50]]
a=np.array(l1) 1 20 70
df=pd.DataFrame(a) 2 40 50
print(df)
Ex:
import pandas as pd First Second
data={"First":[10,20,30,40],"Second":[1.4,1.3,1.5,2.5]} 10 1.4 0
df=pd.DataFrame(data)
print(df) 20 1.3 1
30 1.5 2
40 2.5 3
import pandas as pd First Second Third
data={"First":[10,20,30,40],"Second":[1.4,1.3,1.5,2.5]} 0 10 1.4 11.4
df=pd.DataFrame(data)
df["Third"]=df["First"]+df["Second"] 1 20 1.3 21.3
print(df) 2 30 1.5 31.5
3 40 2.5 42.5
Locate Row
As you can see from the result above, the DataFrame is like a table with rows and columns. Pandas
use the loc attribute to return one or more specified row(s)
import pandas as pd calories 420
data = { duration 50
"calories": [420, 380, 390],
"duration": [50, 40, 45] Name: 0, dtype: int64
}
df = pd.DataFrame(data)
print(df.loc[0])
By using CSV File(Comma Separated Values): CSV files must be saved on some file name with an
extension .csv ( internally treated as excel sheet ). CSV files stores Tabular data (Numbers and text)
in plain text.
Ex:
import pandas as a
df=a.read_csv("C:\king\Book1.csv")
print(df)
Empty Cells: Empty cells can potentially give you a wrong result when you analyze data.
Remove Rows: One way to deal with empty cells is to remove rows that contain empty cells. This is
usually OK, since data sets can be very big, and removing a few rows will not have a big impact on
the result.
import pandas as pd
df = pd.read_csv(' c:\king\data.csv')
df.dropna(inplace = True)
print(df.to_string())
import pandas as a
df=a.read_csv("C:\king\Book1.csv")
print(len(df))
import pandas as a
name=["a","b","c"]
scr=[90,40,80]
b={"name":name,"score":scr}
df=a.DataFrame(b)
c=df.to_csv("C:\king\Book2.csv")#we can give path to save file
print(c)
import pandas as a
df=a.read_csv("C:/king/Book1.csv", usecols=["name"], squeeze=True)
b=df.sort_values()
print(b)
Information of file
import pandas as a
df=a.read_csv("C:/king/Book1.csv")
print(df.info() )
Cleaning data
=======================================================
1) DataFrameobj.head(no.of rows)
2) DataFrameobj.tail(no.of rows)
3) DataFrameobj.describe()
4) DataFrameobj.shape
5) DataFrameobj[start:stop:step]
6) DataFrameobj["Col Name"]
9) DataFrameobj.iterrows()
===================================================
Understabding loc() ----- here start and stop index Included and
--------------------------------------------------------------------------------------
1) DataFrameobj.loc[row_number]
2) DataFrameobj.loc[row_number,[Col Name,.........] ]
3) DataFrameobj.loc[start:stop:step]
4) DataFrameobj.loc[start:stop:step,["Col Name"] ]
5) DataFrameobj.loc[start:stop:step,["Col Name1", Col Name-2......."] ]
------------------------------------------------------------------------------------------------------------
Understabding iloc() ----- here start index included and stop index excluded and Col Numbers must
be used(but not column names]
--------------------------------------------------------------------------------------
1) DataFrameobj.iloc[row_number]
2) DataFrameobj.iloc[row_number,Col Number.........]
=======================================================================
=======================================================================
=======================================================================
=======================================================================
1)dataframe.drop(columns="col name")
2)dataframe.drop(columns="col name",inplace=True)
=======================================================================
=======================================================================
1) dataframeobj.sort_values("colname")
2) dataframeobj.sort_values("colname",ascending=False)
=======================================================================
=======================================================================
=======================================================================
1) dataframeobj.drop_duplicates()
2) dataframeobj.drop_duplicates(inplace=True)
=======================================================================
=======================================================================
Special Case:
3) dataframeobj.loc[simple condition.str.contains(str)]
4) dataframeobj.loc[simple condition.str.startswith(str)]
5) dataframeobj.loc[simple condition.str.endswith(str)]