You are on page 1of 2

Data analytics and Decision making Dr.

Vishwesh Singbal

Working with Google Colab


1. Go to https://colab.research.google.com/ and login with your gmail id.
2. Go to File → New notebook. Jupyter notebooks created on colab when saved will be saved to your Google
drive. A Jupyter Notebook is an interactive web application for creating and sharing computational documents.
These notebooks have .ipynb extension.

3. If you are provided a file with .ipynb extension (as you will be in subsequent classes), then go to File →
Upload Notebook, locate the .ipynb file and upload it.
4. To import a .csv or .xlsx data file in Colab, follow the instructions below.

As Colab is a virtual platform, please note that a file uploaded will be available only till the session lasts. The
next time you load the notebook, you will have to upload the data file again.
Data analytics and Decision making Dr. Vishwesh Singbal

Detecting missing values - Example 1


(Lines starting with # are comments, Do not copy “>>” when copying the code)
# Import necessary libraries
>>import numpy as np
>>import pandas as pd

# Read a CSV file named "Missing_Value_Treatment.csv" into a pandas DataFrame


>>df1 = pd.read_csv("Missing_Value_Treatment.csv")

# Display information about the DataFrame, including data types and missing values
>>df1.info()

# Replace specific values ('-', '@@', '#') in the "Age" column with NaN and convert the column to float data
# type
>>df1["Age"] = df1["Age"].replace(['-', '@@', '#'], np.nan).astype('float')
# Print the updated DataFrame
>>df
>>print(df1)

# Replace specific values ('?', 'nuLL') in the "Own_house" column with NaN and convert the column to float
# data type
>>df1.Own_house = df1.Own_house.replace(["?", 'nuLL'], np.nan).astype('float')
>>print(df1)
>>df1.info()

# Replace specific values ('nAN', '###') in the "Income_2020" column with NaN and convert the column to
# float data type
>>df1.Income_2020= df1.Income_2020.replace(["nAN", '###'],np.nan).astype('float')
>>print(df1)
>>df1.info()

# Replace specific values ('###') in the "Income_2021" column with NaN and convert the column to float data
# type
>>df1.Income_2021 = df1.Income_2021.replace(['###'], np.nan).astype("float")
>>df1.info()
>>print(df1)
#The above code should give an error as Python recognizes the values in "Income_2021" as “string” due to
# presence of commas “,”
# Hence first replace specific values ('###') in the "Income_2021" column with NaN
>>df1.Income_2021 = df1.Income_2021.replace(['###'], np.nan)
>>df1.info()
>>print(df1)

# Then remove commas and convert the "Income_2021" column to float data type
>>df1.Income_2021=df1.Income_2021.replace(',',"").astype("float")
>>df1.info()
>>print(df1)

You might also like