Welcome to Scribd!

Data Preprocessing

Uploaded by

0% found this document useful (0 votes)

5 views2 pages

The document loads and cleans a Titanic dataset, encodes categorical variables as dummy variables, concatenates the dummy variables to the original dataframe, drops unnecessary columns, interpolates missing age values, scales numeric data using a MinMaxScaler and StandardScaler, and plots a bar chart comparing mean scores by gender and group with error bars.

Original Description:

Copyright

Available Formats

TXT, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as TXT, PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

5 views2 pages

Data Preprocessing

Uploaded by

mahaboobshaik070806

Copyright:

Available Formats

Download as TXT, PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

df_2=pd.read_csv("/content/titanic (1).

csv")
df_2

df_2.info(0)

df_2.describe()

cols=['Name','Ticket','Cabin']
df_2=df_2.drop(cols,axis=1)
df_2.info()

df_2=df_2.dropna(0)
df_2.info()

df_2['Sex']

df_2['Pclass']

df_2['Embarked']

pd.get_dummies(df_2['Pclass'])

pd.get_dummies(df_2['Sex'])

dummies=[]
cols=['Pclass','Sex','Embarked']
for col in cols:
dummies.append(pd.get_dummies(df_2[col]))

print(dummies)

titanic_dummies=pd.concat(dummies,axis=1)

titanic_dummies

df_2=pd.concat((df_2,titanic_dummies),axis=1)
df_2

df_2.drop(0)

df_2=df_2.drop(['Sex','Embarked','Pclass'],axis=1)
df_2.info()

df_2.isnull()

df_2['Age']=df_2['Age'].interpolate()
print(df_2)

from sklearn.preprocessing import MinMaxScaler

data=[[-1,2],[-0.5,6],[0,10],[1,18]]
print(data)

scaler =MinMaxScaler()
scaler.fit(data)

print(scaler.transform(data))
scaler.data_min_

scaler.data_max_

from numpy import asarray

from sklearn.preprocessing import StandardScaler

data=asarray([[100,0.001],
[50,0.05],
[50,0.05],
[88,0.07],
[4,0.1]])
print(data)

scaler=StandardScaler()
scaled=scaler.fit_transform(data)
print(scaled)

#men & women

import numpy as np
import matplotlib.pyplot as plt

N = 5
menMeans = (22, 30, 35, 35, 26)
womenMeans = (25, 32, 30, 35, 29)
menStd = (4, 3, 4, 1, 5)
womenStd = (3, 5, 2, 3, 3)
# the x locations for the groups
ind = np.arange(N)
# the width of the bars
width = 0.35

p1 = plt.bar(ind, menMeans, width, yerr=menStd, color='red')

p2 = plt.bar(ind, womenMeans, width,
bottom=menMeans, yerr=womenStd, color='green')

plt.ylabel('Scores')
plt.xlabel('Groups')
plt.title('Scores by group\n' + 'and gender')
plt.xticks(ind, ('Group1', 'Group2', 'Group3', 'Group4', 'Group5'))
plt.yticks(np.arange(0, 81, 10))
plt.legend((p1[0], p2[0]), ('Men', 'Women'))

plt.show()

Data Science Basics Cheatsheet
Document1 page
Data Science Basics Cheatsheet
acutotu
67% (3)
Pyspark Vs Pandas Cheatsheet
Document3 pages
Pyspark Vs Pandas Cheatsheet
api-261489892
No ratings yet
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
Machine Learning Notes: 2. All The Commands For Eda
Document5 pages
Machine Learning Notes: 2. All The Commands For Eda
naveen katta
100% (1)
EDA Plots Code
Document13 pages
EDA Plots Code
prashant yadav
No ratings yet
Code
Document6 pages
Code
Keerti Gulati
No ratings yet
SQL90 GH 97
Document5 pages
SQL90 GH 97
k2sh
No ratings yet
Import Cv2
Document4 pages
Import Cv2
123sriram csc
No ratings yet
Practicle6 (Code)
Document4 pages
Practicle6 (Code)
Pallavi Gaikwad
No ratings yet
Ansc 4
Document4 pages
Ansc 4
KuanTing Kuo
No ratings yet
Pandas - PySpark Equivalents-1
Document3 pages
Pandas - PySpark Equivalents-1
Rufai
No ratings yet
File 1
Document2 pages
File 1
sameeruddin409
No ratings yet
吳祐勳111071606期末作業
Document26 pages
吳祐勳111071606期末作業
陳俊傑
No ratings yet
Apr 2023
Document32 pages
Apr 2023
Abhilash Jose
No ratings yet
Scripts de Prueba de Business Intelligence
Document10 pages
Scripts de Prueba de Business Intelligence
JUAN DIEGO VALENZUELA COBOS
No ratings yet
Cengizhan Sahin
Document26 pages
Cengizhan Sahin
dummy account
No ratings yet
Sample Json
Document2 pages
Sample Json
yohanesderese04
No ratings yet
Code
Document3 pages
Code
Lovely Akhila
No ratings yet
Sample 1
Document3 pages
Sample 1
m03479368
No ratings yet
ML 1-10
Document53 pages
ML 1-10
22128008
No ratings yet
U2 - 1 DPP
Document32 pages
U2 - 1 DPP
mgs181101
No ratings yet
3 Facebook Data Reaction of Pakistani People
Document2 pages
3 Facebook Data Reaction of Pakistani People
Sayan Pal
No ratings yet
Kmeans Clustering Implementation Using Python
Document5 pages
Kmeans Clustering Implementation Using Python
Poornima Ghodke
No ratings yet
EE 559 HW2Code PDF
Document7 pages
EE 559 HW2Code PDF
Ali
No ratings yet
PANDAS
Document74 pages
PANDAS
yomikay302
No ratings yet
Medical Management Sujal&harsh
Document16 pages
Medical Management Sujal&harsh
Sujal Patel
No ratings yet
Python CA 4
Document9 pages
Python CA 4
subham patra
No ratings yet
Cardio Screen RF
Document27 pages
Cardio Screen RF
The Mind
100% (1)
Ip Project by Nitin Rajawat PDF
Document10 pages
Ip Project by Nitin Rajawat PDF
NITIN RAJAWAT
No ratings yet
DATASCI112 Midterm Cheat Sheet
Document2 pages
DATASCI112 Midterm Cheat Sheet
Niall Thomas Kehoe
No ratings yet
Ts Notebook
Document22 pages
Ts Notebook
Danilo Santiago Criollo Chávez
100% (1)
Simple NMT
Document3 pages
Simple NMT
Furious Five
No ratings yet
80838581
Document9 pages
80838581
ikhwancules46
No ratings yet
#Merging The Columns From Two Data Sets
Document3 pages
#Merging The Columns From Two Data Sets
Muhammad Mohkum Awaisi
No ratings yet
Programmation Orienté Objet (Part-2)
Document9 pages
Programmation Orienté Objet (Part-2)
besem boukhatem
No ratings yet
Py Spark
Document8 pages
Py Spark
pratikpa14052000
No ratings yet
Import Numpy As NP
Document6 pages
Import Numpy As NP
Maciej Wiśniewski
No ratings yet
10 Minutes To Pandas
Document26 pages
10 Minutes To Pandas
tofy79
No ratings yet
Only Pandas
Document8 pages
Only Pandas
Jyotirmay Sahu
No ratings yet
10 Minutes To PANDAS
Document26 pages
10 Minutes To PANDAS
ruchirmandora
No ratings yet
ML - Tutorial1.ipynb - Colaboratory
Document3 pages
ML - Tutorial1.ipynb - Colaboratory
khushi namdev
No ratings yet
Python Funstinos and OOPS
Document7 pages
Python Funstinos and OOPS
yipemet
No ratings yet
Decision Tree
Document3 pages
Decision Tree
G Suriyanaraynan
No ratings yet
18bce0457 VL2020210104959 Ast01
Document18 pages
18bce0457 VL2020210104959 Ast01
Nikitha Reddy
No ratings yet
Hackathon Days - 5 Days - Work Report
Document12 pages
Hackathon Days - 5 Days - Work Report
Charmi Gangani
No ratings yet
New 2
Document6 pages
New 2
pavithra
No ratings yet
CODIGO#
Document4 pages
CODIGO#
deger treuri
No ratings yet
Pandas Attribute
Document2 pages
Pandas Attribute
Manish Jain
No ratings yet
Project Guide
Document3 pages
Project Guide
Piyush Pilare
No ratings yet
Pandas Cheat Sheet by Eugenia Anello: Table of Contents
Document9 pages
Pandas Cheat Sheet by Eugenia Anello: Table of Contents
George Iskander
No ratings yet
Coding
Document3 pages
Coding
Ajue Ramli
No ratings yet
ML File 211173
Document19 pages
ML File 211173
NANDINI AGGARWAL 211131
No ratings yet
New 3
Document5 pages
New 3
Juan Camacho Rocha
No ratings yet
QLSTMvs LSTM
Document7 pages
QLSTMvs LSTM
mohamedaligharbi20
No ratings yet
AlexNet Transfer Learning - Ipynb
Document5 pages
AlexNet Transfer Learning - Ipynb
Praveen J L
No ratings yet
Turtle Clock
Document3 pages
Turtle Clock
Krutika Jain
No ratings yet
If Return End If Return End Switch Case
Document13 pages
If Return End If Return End Switch Case
Diasarma
No ratings yet
Formulario - EA
Document7 pages
Formulario - EA
MARLENE FRANCISCA MARIA FUENTES ROA
No ratings yet
12 CS Python Revision Tour 1&2 Worksheet
Document5 pages
12 CS Python Revision Tour 1&2 Worksheet
skltripathi
No ratings yet
Customer Airticketbooking
Document9 pages
Customer Airticketbooking
Tanisha Maahira Thameem
No ratings yet