You are on page 1of 5

Submitted by

Mir Tahmid Hossain


ID: 2022-1-96-016
Submitted to
Dr. Mohammad Rezwanul Huq
Assignment#02: Data visualization
Course Title: CSE520 Statistics for Data Science
At first we need to import library files because In Python, libraries are used to refer to a
collection of modules that are used repeatedly in various programs without the need of
writing them from scratch. Modules on the other hand refer to any Python file saved with the .
py extension.

Also, we upload the given “Boston” dataset and read the dataset.

For Task#1. We will need to generate histogram for each attribute. And we will find the
characteristics of the data distribution (positively-skewed or negatively-skewed).

This is the code of generating histogram for each attribute.

Now, we will find out the characteristics of data distribution based on skewness.
Task 2: Generate Box Plot for each attribute. Also, comment on the number of outliers for each
attribute.
From the boxplots we find crim, zn, chas, rm, dis, black, ptratio, lstat, and medv have outliers.
*3*. Generate a correlation heatmap among all attributes. Find the top 5 positively correlated
attribute pairs and top 5 negatively correlated attribute pairs.

Pairplot to show how to get positive and negative correlated attribute pairs.
5 positively correlated attribute pairs and 5 negatively correlated attribute pairs are in the
following....

You might also like