You are on page 1of 18

University Institute of Engineering

DEPARTMENT OF COMPUTER SCIENCE &


ENGINEERING
Bachelor of Engineering (Computer Science & Engineering)
Subject Name : Machine Learning Lab
Subject Code:CSP-317
Topic: Machine learning Lab-1
By : Dr. Neeraj
DISCOVER . LEARN . EMPOWER
Dataset
• https://www.kaggle.com/datasets/hesh97/titanicdataset-traincsv

2
Description of Dataset

Variable Definition Key

survival Survival 0 = No, 1 = Yes

pclass Ticket class 1 = 1st, 2 = 2nd, 3 = 3rd

gender gender

Age Age in years Age is fractional if less than 1. If the age is estimated, is it in the form of
xx.5

Sibling = brother, sister, stepbrother, stepsister


sibsp # of siblings / spouses aboard the Titanic Spouse = husband, wife (mistresses and fiancés were ignored)

Parent = mother, father


parch # of parents / children aboard the Titanic Child = daughter, son, stepdaughter, stepson
Some children travelled only with a nanny, therefore parch=0 for them

ticket Ticket number

fare Passenger fare

cabin Cabin number

embarked Port of Embarkation C = Cherbourg, Q = Queenstown, S = Southampton

3
Experiment 1: Implement Exploratory Data
Analysis on any data set.
Importing Libraries
Indexing
Selections
Distinct Elements
missing value find and treatment
Missing Data
Groupby
Pivot Tables

4
Importing Libraries
• import pandas as pd
• import numpy as np
• %matplotlib inline
• import matplotlib.pyplot as plt
• df = pd.read_csv("../desktop/titanic-train.csv")
• df = pd.read_csv('titanic-train.csv')
• df.head(3)
• df.info()
• df.describe()
• mnist.hist(bins=50, figsize=(20, 15))

5
Indexing
• df.iloc[3]
• df.loc[0:4,'Ticket']
• df['Ticket'].head()

6
Selections
• df[df.Age>65]
• df[(df.Age==11)&(df.SibSp==5)]
• df[(df.Age==11)|(df.SibSp==5)]

7
Distinct Elements
• df['Embarked'].unique()

8
missing value find and treatment
• print (df['Age'].mean())
• print (df['Fare'].median())
• print ((df['gender'] =='female').sum())

9
Missing Data
• df.info()
• df['Age'].fillna(30)
• df.isnull().sum()

10
Groupby
• df.groupby('Survived')['Age'].mean()
# Find average age of passengers that survived

11
Pivot Tables
• df.pivot_table(index='gender', columns='Parch', values='Survived',
aggfunc='sum')
• df.pivot_table(index='gender', columns='SibSp', values='Survived',
aggfunc='sum')

12
Exercises:

• select passengers that died


• select passengers who paid less than 40.000 and were in third
class
• locate the name of passegner Id 674
• count the number of survived and the number of dead passengers
• count the number of survived and dead per each gender
• calculate average price paid by survived and dead people

13
Data Visualization
• df.Age.plot()

• df.Age.plot(fontsize=15)
• plt.title('Line Plot', size=20)

• df.Age.plot(style='o', fontsize=15)
• plt.title('Point Plot', size=20)

14
• df.Age.plot(kind='hist', fontsize=15)
• plt.title('Histogram', size=20)

• Point to be noted:
• plot the age histogram of the titanic passengers
• plot a pie chart of survived
• plot the age histogram of the two sub-populations of dead and survived passengers (in
the same plot)
• plot the age histogram of the two sub-populations of male and female (in the same
plot)
• plot a bar chart of the port of embarkement
15
Prettier plots with seaborn
• import seaborn as sns
• df['Age'].plot(kind='hist')
• plt.title('Histogram of Age')
• plt.xlabel('Age')

• sns.set(style="ticks")
• x = df['Age']
• y = df[df['Fare'] < 100]['Fare']
• sns.jointplot(x, y, kind="hex", color="#4CB391")
16
• sns.jointplot(x, y, kind="kde", color="#4CB391")

• sns.violinplot(x="gender", y="Age", hue="Survived", data=df,


palette="muted", split=True)

17
THANK YOU

You might also like