You are on page 1of 6

NAME=TEJAS PRATAP

REGISTRATION NO. = 20BCB0010


COURSE = CSE4020 (Machine Learning)
SLOT = L29 + L30

Experiment-1
PROBLEM :
Demonstrate possible missing value analysis approaches using any real world
data.

CODE (Jupyter Notebook)


1) import pandas as pd
import numpy as np
2) train=pd.read_csv(r"C:\Users\sms2c\downloa\train.csv")
3) train.info()
4) train.head(10)
5) train.isnull().sum()
6) train['Order Quantity'].replace(0, np.nan, inplace= True)
7) df=train.dropna()
df.info()
8) mean_value=train['Order Quantity'].mean()
9) train[' Order Quantity ']=train[‘Order Quantity'].fillna(mean_value)
train.isnull().sum()
10) train['Sales’].replace(0, np.nan, inplace= True)
11) df=train.dropna()
df.info()
12) mean_value=train['Sales’].mean()
13) train[' Sales ']=train[‘Sales'].fillna(mean_value)
train.isnull().sum()
14) train['profit’].replace(0, np.nan, inplace= True)
15) df=train.dropna()
df.info()
16) mean_value=train['profit’].mean()
17) train[' profit ']=train[‘profit'].fillna(mean_value)
train.isnull().sum()

PROCEDURE :
1) In the first step we import python libraries panda and numpy.
2) Then we will create .csv file in the same folder with your code. This will
work import pandas as pd data = pd.read_csv('data.csv').
3) We will get the output of the first 10 rows of our data.
4) This will tell us the total number of NaN in or data.

5) If the missing value isn’t identified as NaN , then we have to first convert
or replace such non NaN entry with a NaN.
6) This method reduces the quality of our model as it reduces sample size
because it works by deleting all other observations where any of the
variable is missing.

7) Order Quantity, Sales, profit are column name for our train data.
8) Now we will replace all NaN values with the mean of the non null values.
9) Then we will check whether Order Quantity, Sales, profit attribute have
null value or not.

10) At last we will print the Output in which all the missing values will be
filled accordingly.

OUTPUT :

You might also like