You are on page 1of 15

pandas_Deepak_Kumar_saxena

November 5, 2020

1 Pandas
Pandas is a popular Python package for data science, and with good reason: it offers powerful,
expressive and flexible data structures that make data manipulation and analysis easy, among many
other things. The DataFrame is one of these structures.
• Key Features of Pandas
– Fast and efficient DataFrame object with default and customized indexing.
– Tools for loading data into in-memory data objects from different file formats.
– Data alignment and integrated handling of missing data.
– Reshaping and pivoting of date sets.
– Label-based slicing, indexing and subsetting of large data sets.
– Columns from a data structure can be deleted or inserted.
– Group by data for aggregation and transformations.
– High performance merging and joining of data.
– Time Series functionality.

Pandas deals with the following three data structures −


1. Series
2. DataFrame
3. Panel

1.1 Installing pandas if python is already installed


[1]: !pip install pandas
print("pandas successfully Installed")

Requirement already satisfied: pandas in


/home/deepzk/anaconda3/lib/python3.8/site-packages (1.0.5)
Requirement already satisfied: python-dateutil>=2.6.1 in
/home/deepzk/anaconda3/lib/python3.8/site-packages (from pandas) (2.8.1)
Requirement already satisfied: numpy>=1.13.3 in
/home/deepzk/anaconda3/lib/python3.8/site-packages (from pandas) (1.18.5)
Requirement already satisfied: pytz>=2017.2 in
/home/deepzk/anaconda3/lib/python3.8/site-packages (from pandas) (2020.1)
Requirement already satisfied: six>=1.5 in
/home/deepzk/anaconda3/lib/python3.8/site-packages (from python-

1
dateutil>=2.6.1->pandas) (1.15.0)
pandas successfully Installed

1.2 Importing and performing operations


[2]: #importing pandas
import pandas as pd
print("Pandas imported successfully")

Pandas imported successfully

1.3 creating a data frame


• data frame is an object that stores data as row and coloumns
[3]: # creating a Data frame
df = pd.DataFrame({
'Name':['Deepak Kumar','Priyanshu Saxena', 'Durgesh singh'],
"Address":['Purabhoj',"sikanderpur","kannauj"],
"Age":[21,18,30]

})
# creating a dataframe by passing a ditionary

df.head()#to inspect 1st five rows

[3]: Name Address Age


0 Deepak Kumar Purabhoj 21
1 Priyanshu Saxena sikanderpur 18
2 Durgesh singh kannauj 30

[4]: # creating a dataframe by passing lists

df1 = pd.DataFrame([["Rinku","Purabhoj",23],
['Priyank','sikanderpur',32],
['Rahul','kannauj',30]],
columns = ['name','address','age']
)
print(df1)

name address age


0 Rinku Purabhoj 23
1 Priyank sikanderpur 32
2 Rahul kannauj 30

[5]: #Data in the series can be accessed similar to that in an ndarray


df2 = pd.Series([1,2,3,4,5],index = ['a','b','c','d','e'])
df2

2
[5]: a 1
b 2
c 3
d 4
e 5
dtype: int64

[6]: #cheking the shape of the data


df.shape

[6]: (3, 3)

[7]: # returns the indexes of the columns


df.index

[7]: RangeIndex(start=0, stop=3, step=1)

[8]: # checking the no of colomns


df.columns

[8]: Index(['Name', 'Address', 'Age'], dtype='object')

[9]: # checking for the information of the data frame


df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 3 non-null object
1 Address 3 non-null object
2 Age 3 non-null int64
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes

[10]: # describing the basic statistics of datafarame


df.describe()

[10]: Age
count 3.000000
mean 23.000000
std 6.244998
min 18.000000
25% 19.500000
50% 21.000000
75% 25.500000

3
max 30.000000

[11]: import numpy as np


df.describe(include = np.object)

[11]: Name Address


count 3 3
unique 3 3
top Durgesh singh kannauj
freq 1 1

[12]: #returns sum of values


df.sum()

[12]: Name Deepak KumarPriyanshu SaxenaDurgesh singh


Address Purabhojsikanderpurkannauj
Age 69
dtype: object

[13]: #returns cumulative sum of values


df.cumsum()

[13]: Name Address Age


0 Deepak Kumar Purabhoj 21
1 Deepak KumarPriyanshu Saxena Purabhojsikanderpur 39
2 Deepak KumarPriyanshu SaxenaDurgesh singh Purabhojsikanderpurkannauj 69

[14]: #returns mean of values


df.mean()

[14]: Age 23.0


dtype: float64

[15]: #returns median of values


df.median()

[15]: Age 21.0


dtype: float64

[16]: #apply function


print(df.apply(lambda x:x*2))
print('\n')

print(df.applymap(lambda x:x*2)) #applies function element wise

Name Address Age


0 Deepak KumarDeepak Kumar PurabhojPurabhoj 42
1 Priyanshu SaxenaPriyanshu Saxena sikanderpursikanderpur 36

4
2 Durgesh singhDurgesh singh kannaujkannauj 60

Name Address Age


0 Deepak KumarDeepak Kumar PurabhojPurabhoj 42
1 Priyanshu SaxenaPriyanshu Saxena sikanderpursikanderpur 36
2 Durgesh singhDurgesh singh kannaujkannauj 60

1.3.1 Retriving data from dataframe

[17]: print(df.iloc[0]) ##selects and displays entire first row

print("\n")

print(df.iloc[[0],[0]])##selects first row & first column element

print("\n")

print(df.iat[0,0]) #selects first row & first column

print("\n")

print(df['Name']) ##displays entire column by label

print("\n")

print(df.Age) #displays entire column

print("\n")

print(df.loc[:,'Address']) #displays entire column by label

print("\n")

print(df[df.Age>20]) #displays rows where the given statement is true

Name Deepak Kumar


Address Purabhoj
Age 21
Name: 0, dtype: object

Name
0 Deepak Kumar

Deepak Kumar

5
0 Deepak Kumar
1 Priyanshu Saxena
2 Durgesh singh
Name: Name, dtype: object

0 21
1 18
2 30
Name: Age, dtype: int64

0 Purabhoj
1 sikanderpur
2 kannauj
Name: Address, dtype: object

Name Address Age


0 Deepak Kumar Purabhoj 21
2 Durgesh singh kannauj 30

1.3.2 Modifying Dataframe


• Adding columns to data frame
• Using lambda function to declear calculate complex quantities
• Renaming columns
[18]: # adding a column
df['occupation']= ['Student','Student','Doctor']

df['mail'] = ['deepak@gmail.com','priyank@gmail.com','durgesh@gmail.com']
df

[18]: Name Address Age occupation mail


0 Deepak Kumar Purabhoj 21 Student deepak@gmail.com
1 Priyanshu Saxena sikanderpur 18 Student priyank@gmail.com
2 Durgesh singh kannauj 30 Doctor durgesh@gmail.com

[19]: # ading a column with logic

df['email provier'] = df.mail.apply(lambda x:x.split('@')[-1])


df

[19]: Name Address Age occupation mail \


0 Deepak Kumar Purabhoj 21 Student deepak@gmail.com
1 Priyanshu Saxena sikanderpur 18 Student priyank@gmail.com

6
2 Durgesh singh kannauj 30 Doctor durgesh@gmail.com

email provier
0 gmail.com
1 gmail.com
2 gmail.com

[20]: # renaming a column


df.rename(columns = {
'mail':'Email'

}, inplace =True) # using inplace = True lets us to edit the original Dataframe
df

[20]: Name Address Age occupation Email \


0 Deepak Kumar Purabhoj 21 Student deepak@gmail.com
1 Priyanshu Saxena sikanderpur 18 Student priyank@gmail.com
2 Durgesh singh kannauj 30 Doctor durgesh@gmail.com

email provier
0 gmail.com
1 gmail.com
2 gmail.com

[21]: ### exporting and reading data in pandas


data = pd.read_csv("netflix_titles.csv")
print(data.shape)
data.head() # inspecting rows

(6234, 12)

[21]: show_id type title \


0 81145628 Movie Norm of the North: King Sized Adventure
1 80117401 Movie Jandino: Whatever it Takes
2 70234439 TV Show Transformers Prime
3 80058654 TV Show Transformers: Robots in Disguise
4 80125979 Movie #realityhigh

director \
0 Richard Finn, Tim Maltby
1 NaN
2 NaN
3 NaN
4 Fernando Lebrija

cast \
0 Alan Marriott, Andrew Toth, Brian Dobson, Cole…

7
1 Jandino Asporaat
2 Peter Cullen, Sumalee Montano, Frank Welker, J…
3 Will Friedle, Darren Criss, Constance Zimmer, …
4 Nesta Cooper, Kate Walsh, John Michael Higgins…

country date_added release_year \


0 United States, India, South Korea, China September 9, 2019 2019
1 United Kingdom September 9, 2016 2016
2 United States September 8, 2018 2013
3 United States September 8, 2018 2016
4 United States September 8, 2017 2017

rating duration listed_in \


0 TV-PG 90 min Children & Family Movies, Comedies
1 TV-MA 94 min Stand-Up Comedy
2 TV-Y7-FV 1 Season Kids' TV
3 TV-Y7 1 Season Kids' TV
4 TV-14 99 min Comedies

description
0 Before planning an awesome wedding for his gra…
1 Jandino Asporaat riffs on the challenges of ra…
2 With the help of three human allies, the Autob…
3 When a prison ship crash unleashes hundreds of…
4 When nerdy high schooler Dani finally attracts…

1.3.3 aggregate in pandas


• An aggregate is a statiscal way of creating a single number that describes a group of numbers.
[22]: #checing for null values
data.isnull().sum()

[22]: show_id 0
type 0
title 0
director 1969
cast 570
country 476
date_added 11
release_year 0
rating 10
duration 0
listed_in 0
description 0
dtype: int64

8
[23]: #checking for unique values
data.nunique()

[23]: show_id 6234


type 2
title 6172
director 3301
cast 5469
country 554
date_added 1524
release_year 72
rating 14
duration 201
listed_in 461
description 6226
dtype: int64

[24]: #checking for duplicate values


data.duplicated().sum()

[24]: 0

1.3.4 some common commands


• mean()
• std()
• median()
• max()
• min()
• count()
• nunique()
• unique()

[25]: # making a copy of our data frame


df = data.copy()
df.shape

[25]: (6234, 12)

[26]: # drop null values


df = df.dropna()
df.shape

[26]: (3774, 12)

[27]: # As we have observed while checking the information of the dataset, that dat␣
,→is in "Object form".

9
# So we are converting it into DAte time formate
df["date_added"] = pd.to_datetime(df['date_added'])
df['day_added'] = df['date_added'].dt.day
df['year_added'] = df['date_added'].dt.year
df['month_added']=df['date_added'].dt.month
df['year_added'].astype(int);
df['day_added'].astype(int);

Group by
[28]: # using the groupby
total_year_time = data.groupby('release_year').duration.count().reset_index()
total_year_time

[28]: release_year duration


0 1925 1
1 1942 2
2 1943 3
3 1944 3
4 1945 3
.. … …
67 2016 830
68 2017 959
69 2018 1063
70 2019 843
71 2020 25

[72 rows x 2 columns]

pivot table
• when we perform a groupby across multiple columns
• we often want to change how our dat is stored
[31]: data_pivot = df.groupby(['cast','country'])['release_year'].count().
,→reset_index()

pivoted = data_pivot.pivot(columns = 'country',


index = 'cast',
values = 'release_year')
pivoted.head()

[31]: country Argentina \


cast
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar… NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci… NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish… NaN

10
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm… NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab… NaN

country Argentina, Brazil, France,


Poland, Germany, Denmark \
cast
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar…
NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci…
NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish…
NaN
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm…
NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab…
NaN

country Argentina, Chile \


cast
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar… NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci… NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish… NaN
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm… NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab… NaN

country Argentina, Chile, Peru \


cast
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar… NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci… NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish… NaN
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm… NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab… NaN

country Argentina, France \


cast
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar… NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci… NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish… NaN
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm… NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab… NaN

country Argentina, France, Germany


\
cast
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar… NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci… NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish… NaN

11
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm… NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab… NaN

country Argentina, Italy \


cast
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar… NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci… NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish… NaN
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm… NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab… NaN

country Argentina, Spain \


cast
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar… NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci… NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish… NaN
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm… NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab… NaN

country Argentina, United States \


cast
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar… NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci… NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish… NaN
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm… NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab… NaN

country Argentina, Uruguay, Serbia


\
cast
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar… NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci… NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish… NaN
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm… NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab… NaN

country … \
cast …
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar… …
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci… …
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish… …
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm… …
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab… …

country United States, United


Kingdom, France \
cast

12
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar…
NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci…
NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish…
NaN
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm…
NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab…
NaN

country United States, United


Kingdom, France, Germany, Japan \
cast
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar…
NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci…
NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish…
NaN
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm…
NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab…
NaN

country United States, United


Kingdom, Germany \
cast
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar…
NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci…
NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish…
NaN
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm…
NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab…
NaN

country United States, United


Kingdom, Italy \
cast
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar…
NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci…
NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish…

13
NaN
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm…
NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab…
NaN

country United States, United


Kingdom, Morocco \
cast
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar…
NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci…
NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish…
NaN
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm…
NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab…
NaN

country United States, United


Kingdom, Spain, South Korea \
cast
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar…
NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci…
NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish…
NaN
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm…
NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab…
NaN

country Uruguay, Argentina, Spain \


cast
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar… NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci… NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish… NaN
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm… NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab… NaN

country Uruguay, Spain, Mexico \


cast
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar… NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci… NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish… NaN

14
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm… NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab… NaN

country Venezuela Vietnam


cast
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar… NaN NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci… NaN NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish… NaN NaN
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm… NaN NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab… NaN NaN

[5 rows x 433 columns]

1.4 Reference
• codeacademy
• coursera
• kaggle

15

You might also like