You are on page 1of 6

CE473: Machine Learning 18DCE125

PRACTICAL – 1

Aim: Write Perform the following using Python Pandas and Matplotlib library on given
dataset:
i. Deal with missing values in the data either by deleting records or using mean/median/mode
imputation.
ii. Detect if Outliers exist and Plot the data distribution using Box Plots, Scatter Plots and
Histograms of matplotlib library
iii. Create and display the correlation matrix of all features of the data.
iv. Record and Analyse Observations.

Data used for problem:

Dataset description:
Dataset is downloaded from http://www.cs.toronto.edu/~delve/data/comp-activ/desc.html.
Dataset name is Computer Activity Dataset.
Dataset has 13 number of attributes and 8192 number of samples. Dataset has Numerical
types of samples.
Does dataset have a target attribute? YES

Did it have missing values and how did you deal with them? NO
Did you perform any data transformation tasks? NO
Did you perform any other data wrangling tasks? NO

DEPSTAR (CE-2) 1
CE473: Machine Learning 18DCE125

Practical Learning and Outcome:


We learned about different libraries such as pandas, numpy, matplotlib and seaborn. We used
different functions to read data from csv file and performed various tasks on it.

Code and Output:

DEPSTAR (CE-2) 2
CE473: Machine Learning 18DCE125

DEPSTAR (CE-2) 3
CE473: Machine Learning 18DCE125

DEPSTAR (CE-2) 4
CE473: Machine Learning 18DCE125

DEPSTAR (CE-2) 5
CE473: Machine Learning 18DCE125

DEPSTAR (CE-2) 6

You might also like