You are on page 1of 4

1

Machine Learning
Practical-1
Aim:
1. Understanding of Data Pre-processing for given dataset 1 using Spyder (Python)
2. Replace Missing values by below imputation strategy.

If "mean", then replace missing values using the mean along the axis.
If "median", then replace missing values using the median along the axis.
If "most_frequent", then replace missing using the most frequent value along the
axis.

Ref.
https://docs.google.com/viewer?a=v&pid=sites&srcid=Z2FucGF0dW5pdmVyc2l0e
S5hYy5pbnxpdHxneDozYmViMzBiZWIwZThjMDM3

Predicated Output:

3. Understanding of categorical data.


4. Replace Country Attribute for given dataset 1 by fit_transform method.
Predicated Output:
2

5. Replace categorical and numerical attributes for given dataset 2 by OneHotEncoder Class.
Predicated Output:

Dataset 1:
Country Age Salary Purchased
France 44 72000 No
Spain 27 48000 Yes
Germany 30 54000 No
Spain 38 61000 No
Germany 40 NaN Yes
France 35 58000 Yes
Spain NaN 52000 No
France 48 79000 Yes
Germany 50 83000 No
France 37 67000 Yes

Dataset2:
Outlook Outlook0 Outlook1
Sunny 1 Sunny
Sunny 2 Sunny
Overcast 3 Overcast
Rain 4 Rain
Rain 5 Rain
Rain 2 Rain
Overcast 3 Overcast
Sunny 1 Sunny
Sunny 2 Sunny
Rain 3 Rain
Sunny 4 Sunny
Overcast 2 Overcast
Overcast 1 Overcast
Rain 3 Rain
3

Python Code: 1

# Data Preprocessing

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset

dataset = pd.read_csv('dataset1.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 3].values

# Taking care of missing data


#from sklearn.preprocessing import Imputer

from sklearn.impute import SimpleImputer


imputer = SimpleImputer(missing_values = np.nan, strategy = 'mean')
imputer = imputer.fit(X[:, 1:3])
X[:, 1:3] = imputer.transform(X[:, 1:3])

from sklearn.preprocessing import LabelEncoder

labelencoder_X = LabelEncoder()
X[:, 0] = labelencoder_X.fit_transform(X[:, 0])
print(X)
4

Python Code: 2

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd

# Importing the dataset


dataset = pd.read_csv('dataset2.csv')
X = dataset.iloc[:, :].values
#y = dataset.iloc[:, 3].values

# Encoding categorical data


from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer

transform = ColumnTransformer([("Outlook_OL0_OL1",OneHotEncoder(),[0,2])],
remainder = 'passthrough')
X = transform.fit_transform(X)

transform = ColumnTransformer([("Outlook_OL0_OL1",OneHotEncoder(),[6])],
remainder = 'passthrough')
X = transform.fit_transform(X)
print(X.astype(int))

You might also like