Professional Documents
Culture Documents
November 5, 2020
1 Pandas
Pandas is a popular Python package for data science, and with good reason: it offers powerful,
expressive and flexible data structures that make data manipulation and analysis easy, among many
other things. The DataFrame is one of these structures.
• Key Features of Pandas
– Fast and efficient DataFrame object with default and customized indexing.
– Tools for loading data into in-memory data objects from different file formats.
– Data alignment and integrated handling of missing data.
– Reshaping and pivoting of date sets.
– Label-based slicing, indexing and subsetting of large data sets.
– Columns from a data structure can be deleted or inserted.
– Group by data for aggregation and transformations.
– High performance merging and joining of data.
– Time Series functionality.
1
dateutil>=2.6.1->pandas) (1.15.0)
pandas successfully Installed
})
# creating a dataframe by passing a ditionary
df1 = pd.DataFrame([["Rinku","Purabhoj",23],
['Priyank','sikanderpur',32],
['Rahul','kannauj',30]],
columns = ['name','address','age']
)
print(df1)
2
[5]: a 1
b 2
c 3
d 4
e 5
dtype: int64
[6]: (3, 3)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Name 3 non-null object
1 Address 3 non-null object
2 Age 3 non-null int64
dtypes: int64(1), object(2)
memory usage: 200.0+ bytes
[10]: Age
count 3.000000
mean 23.000000
std 6.244998
min 18.000000
25% 19.500000
50% 21.000000
75% 25.500000
3
max 30.000000
4
2 Durgesh singhDurgesh singh kannaujkannauj 60
print("\n")
print("\n")
print("\n")
print("\n")
print("\n")
print("\n")
Name
0 Deepak Kumar
Deepak Kumar
5
0 Deepak Kumar
1 Priyanshu Saxena
2 Durgesh singh
Name: Name, dtype: object
0 21
1 18
2 30
Name: Age, dtype: int64
0 Purabhoj
1 sikanderpur
2 kannauj
Name: Address, dtype: object
df['mail'] = ['deepak@gmail.com','priyank@gmail.com','durgesh@gmail.com']
df
6
2 Durgesh singh kannauj 30 Doctor durgesh@gmail.com
email provier
0 gmail.com
1 gmail.com
2 gmail.com
}, inplace =True) # using inplace = True lets us to edit the original Dataframe
df
email provier
0 gmail.com
1 gmail.com
2 gmail.com
(6234, 12)
director \
0 Richard Finn, Tim Maltby
1 NaN
2 NaN
3 NaN
4 Fernando Lebrija
cast \
0 Alan Marriott, Andrew Toth, Brian Dobson, Cole…
7
1 Jandino Asporaat
2 Peter Cullen, Sumalee Montano, Frank Welker, J…
3 Will Friedle, Darren Criss, Constance Zimmer, …
4 Nesta Cooper, Kate Walsh, John Michael Higgins…
description
0 Before planning an awesome wedding for his gra…
1 Jandino Asporaat riffs on the challenges of ra…
2 With the help of three human allies, the Autob…
3 When a prison ship crash unleashes hundreds of…
4 When nerdy high schooler Dani finally attracts…
[22]: show_id 0
type 0
title 0
director 1969
cast 570
country 476
date_added 11
release_year 0
rating 10
duration 0
listed_in 0
description 0
dtype: int64
8
[23]: #checking for unique values
data.nunique()
[24]: 0
[27]: # As we have observed while checking the information of the dataset, that dat␣
,→is in "Object form".
9
# So we are converting it into DAte time formate
df["date_added"] = pd.to_datetime(df['date_added'])
df['day_added'] = df['date_added'].dt.day
df['year_added'] = df['date_added'].dt.year
df['month_added']=df['date_added'].dt.month
df['year_added'].astype(int);
df['day_added'].astype(int);
Group by
[28]: # using the groupby
total_year_time = data.groupby('release_year').duration.count().reset_index()
total_year_time
pivot table
• when we perform a groupby across multiple columns
• we often want to change how our dat is stored
[31]: data_pivot = df.groupby(['cast','country'])['release_year'].count().
,→reset_index()
10
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm… NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab… NaN
11
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm… NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab… NaN
country … \
cast …
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar… …
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci… …
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish… …
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm… …
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab… …
12
50 Cent, Ryan Phillippe, Bruce Willis, Rory Mar…
NaN
A.J. LoCascio, Sendhil Ramamurthy, Fred Tatasci…
NaN
Aadhi, Tapsee Pannu, Ritika Singh, Vennela Kish…
NaN
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm…
NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab…
NaN
13
NaN
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm…
NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab…
NaN
14
Aamina Sheikh, Sanam Saeed, Adnan Malik, Mohamm… NaN
Aamir Khan, Anuskha Sharma, Sanjay Dutt, Saurab… NaN
1.4 Reference
• codeacademy
• coursera
• kaggle
15