Professional Documents
Culture Documents
AP19110010030 Assignment-4 Lab
AP19110010030 Assignment-4 Lab
Kilaru Sravan
AP19110010030
CSE-A
In [1]:
import pandas as pd
import numpy as np
In [2]:
In [3]:
df.isnull().sum()
Out[3]:
batsman 0
total_runs 0
out 0
numberofballs 0
average 34
strikerate 0
dtype: int64
localhost:8888/notebooks/AP19110010030_Assignment-4.ipynb 1/9
04/09/2021 AP19110010030_Assignment-4 - Jupyter Notebook
df=df.drop(['batsman','numberofballs','strikerate'], axis=1)
df
Out[4]:
511 0 1 0.000000
512 0 1 0.000000
513 0 2 0.000000
514 0 1 0.000000
515 0 1 0.000000
In [5]:
df.isnull().sum()
Out[5]:
total_runs 0
out 0
average 34
dtype: int64
In [6]:
df.isnull().sum().sum()
Out[6]:
34
Forward fill(ffill)
In [7]:
df1 = df.copy()
localhost:8888/notebooks/AP19110010030_Assignment-4.ipynb 2/9
04/09/2021 AP19110010030_Assignment-4 - Jupyter Notebook
In [8]:
df1.ffill(inplace = True)
df1
Out[8]:
511 0 1 0.000000
512 0 1 0.000000
513 0 2 0.000000
514 0 1 0.000000
515 0 1 0.000000
In [9]:
df1.isnull().sum().sum()
Out[9]:
Backward fill(bfill)
In [10]:
df2 = df.copy()
localhost:8888/notebooks/AP19110010030_Assignment-4.ipynb 3/9
04/09/2021 AP19110010030_Assignment-4 - Jupyter Notebook
In [11]:
df2.bfill(inplace = True)
df2
Out[11]:
511 0 1 0.000000
512 0 1 0.000000
513 0 2 0.000000
514 0 1 0.000000
515 0 1 0.000000
In [12]:
df2.isnull().sum().sum()
Out[12]:
df3 = df.copy()
localhost:8888/notebooks/AP19110010030_Assignment-4.ipynb 4/9
04/09/2021 AP19110010030_Assignment-4 - Jupyter Notebook
In [14]:
df3.fillna(np.mean(df3["average"]),inplace = True)
df3
Out[14]:
511 0 1 0.000000
512 0 1 0.000000
513 0 2 0.000000
514 0 1 0.000000
515 0 1 0.000000
In [15]:
df3.fillna(np.median(df3["average"]),inplace = True)
In [16]:
df3['average'].fillna(df3['average'].mode(), inplace=True)
In [17]:
df3.isnull().sum().sum()
Out[17]:
df4 = df.copy()
localhost:8888/notebooks/AP19110010030_Assignment-4.ipynb 5/9
04/09/2021 AP19110010030_Assignment-4 - Jupyter Notebook
In [19]:
df4.fillna(0.00000,inplace = True)
df4
Out[19]:
511 0 1 0.000000
512 0 1 0.000000
513 0 2 0.000000
514 0 1 0.000000
515 0 1 0.000000
In [20]:
df4.isnull().sum().sum()
Out[20]:
df5 = df.copy()
localhost:8888/notebooks/AP19110010030_Assignment-4.ipynb 6/9
04/09/2021 AP19110010030_Assignment-4 - Jupyter Notebook
In [22]:
df5.dropna(axis=0,inplace=True)
df5
Out[22]:
511 0 1 0.000000
512 0 1 0.000000
513 0 2 0.000000
514 0 1 0.000000
515 0 1 0.000000
In [23]:
df5.isnull().sum().sum()
Out[23]:
df6 = df.copy()
In [25]:
data = df6['average'] # selecting a row to smooth the noisy data present in that
data = data[:20] # Initially selecting 20 rows for our covinience
data = np.sort(data)
print(data)
41.37719298 42.44230769]
In [26]:
b1=np.zeros((5,4))
b2=np.zeros((5,4))
Mean bin
In [27]:
Boundary bin
In [28]:
min_value = df6['average'].min()
max_value = df6['average'].max()
In [30]:
bins = np.linspace(min_value,max_value,30)
labels = bins[1:]
localhost:8888/notebooks/AP19110010030_Assignment-4.ipynb 8/9
04/09/2021 AP19110010030_Assignment-4 - Jupyter Notebook
In [31]:
Finally we can see a column in added into the data set in the
ending which is the smoothed data of the column named
'average'
localhost:8888/notebooks/AP19110010030_Assignment-4.ipynb 9/9