Professional Documents
Culture Documents
University Institute of Engineering Department of Computer Science & Engineering
University Institute of Engineering Department of Computer Science & Engineering
2
Description of Dataset
gender gender
Age Age in years Age is fractional if less than 1. If the age is estimated, is it in the form of
xx.5
3
Experiment 1: Implement Exploratory Data
Analysis on any data set.
Importing Libraries
Indexing
Selections
Distinct Elements
missing value find and treatment
Missing Data
Groupby
Pivot Tables
4
Importing Libraries
• import pandas as pd
• import numpy as np
• %matplotlib inline
• import matplotlib.pyplot as plt
• df = pd.read_csv("../desktop/titanic-train.csv")
• df = pd.read_csv('titanic-train.csv')
• df.head(3)
• df.info()
• df.describe()
• mnist.hist(bins=50, figsize=(20, 15))
5
Indexing
• df.iloc[3]
• df.loc[0:4,'Ticket']
• df['Ticket'].head()
6
Selections
• df[df.Age>65]
• df[(df.Age==11)&(df.SibSp==5)]
• df[(df.Age==11)|(df.SibSp==5)]
7
Distinct Elements
• df['Embarked'].unique()
8
missing value find and treatment
• print (df['Age'].mean())
• print (df['Fare'].median())
• print ((df['gender'] =='female').sum())
9
Missing Data
• df.info()
• df['Age'].fillna(30)
• df.isnull().sum()
10
Groupby
• df.groupby('Survived')['Age'].mean()
# Find average age of passengers that survived
11
Pivot Tables
• df.pivot_table(index='gender', columns='Parch', values='Survived',
aggfunc='sum')
• df.pivot_table(index='gender', columns='SibSp', values='Survived',
aggfunc='sum')
12
Exercises:
13
Data Visualization
• df.Age.plot()
• df.Age.plot(fontsize=15)
• plt.title('Line Plot', size=20)
• df.Age.plot(style='o', fontsize=15)
• plt.title('Point Plot', size=20)
14
• df.Age.plot(kind='hist', fontsize=15)
• plt.title('Histogram', size=20)
• Point to be noted:
• plot the age histogram of the titanic passengers
• plot a pie chart of survived
• plot the age histogram of the two sub-populations of dead and survived passengers (in
the same plot)
• plot the age histogram of the two sub-populations of male and female (in the same
plot)
• plot a bar chart of the port of embarkement
15
Prettier plots with seaborn
• import seaborn as sns
• df['Age'].plot(kind='hist')
• plt.title('Histogram of Age')
• plt.xlabel('Age')
• sns.set(style="ticks")
• x = df['Age']
• y = df[df['Fare'] < 100]['Fare']
• sns.jointplot(x, y, kind="hex", color="#4CB391")
16
• sns.jointplot(x, y, kind="kde", color="#4CB391")
17
THANK YOU