You are on page 1of 20

10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

1. Importing necessasry libraries


In [44]:

from matplotlib import pyplot as plt


%matplotlib inline

import pandas as pd
import seaborn as sns
#See the chart in the jupyter notebook
#import matplotlib.pyplot as plt

In [5]:

month = ['Jan','Feb','Mar','Apr','May','Jun','July']
sales = [30000,25000,50000,45000,42000,30000,33000]

In [6]:

print(month)
print(sales)

['Jan', 'Feb', 'Mar', 'Apr', 'May', 'Jun', 'July']

[30000, 25000, 50000, 45000, 42000, 30000, 33000]

In [16]:

plt.figure(figsize=(15,6))
plt.plot(month,sales)
plt.xlabel('Month')
plt.ylabel('Sales')
plt.title('Monthwise Sales')
plt.show()

Import mtcars dataset

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 1/20


10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

In [20]:

cars_data = pd.read_csv('mtcars.csv') #Automobile


cars_data

Out[20]:

mpg cyl disp hp drat wt qsec vs am gear carb

0 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4

1 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4

2 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1

3 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1

4 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2

5 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1

6 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4

7 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2

8 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2

9 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4

10 17.8 6 167.6 123 3.92 3.440 18.90 1 0 4 4

11 16.4 8 275.8 180 3.07 4.070 17.40 0 0 3 3

12 17.3 8 275.8 180 3.07 3.730 17.60 0 0 3 3

13 15.2 8 275.8 180 3.07 3.780 18.00 0 0 3 3

14 10.4 8 472.0 205 2.93 5.250 17.98 0 0 3 4

15 10.4 8 460.0 215 3.00 5.424 17.82 0 0 3 4

16 14.7 8 440.0 230 3.23 5.345 17.42 0 0 3 4

17 32.4 4 78.7 66 4.08 2.200 19.47 1 1 4 1

18 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2

19 33.9 4 71.1 65 4.22 1.835 19.90 1 1 4 1

20 21.5 4 120.1 97 3.70 2.465 20.01 1 0 3 1

21 15.5 8 318.0 150 2.76 3.520 16.87 0 0 3 2

22 15.2 8 304.0 150 3.15 3.435 17.30 0 0 3 2

23 13.3 8 350.0 245 3.73 3.840 15.41 0 0 3 4

24 19.2 8 400.0 175 3.08 3.845 17.05 0 0 3 2

25 27.3 4 79.0 66 4.08 1.935 18.90 1 1 4 1

26 26.0 4 120.3 91 4.43 2.140 16.70 0 1 5 2

27 30.4 4 95.1 113 3.77 1.513 16.90 1 1 5 2

28 15.8 8 351.0 264 4.22 3.170 14.50 0 1 5 4

29 19.7 6 145.0 175 3.62 2.770 15.50 0 1 5 6

30 15.0 8 301.0 335 3.54 3.570 14.60 0 1 5 8

31 21.4 4 121.0 109 4.11 2.780 18.60 1 1 4 2

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 2/20


10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

2. Perform Initial analysis and try to get 2 insights from this data.

In [21]:

cars_data.shape

Out[21]:

(32, 11)

In [22]:

cars_data.isna().sum()

Out[22]:

mpg 0

cyl 0

disp 0

hp 0

drat 0

wt 0

qsec 0

vs 0

am 0

gear 0

carb 0

dtype: int64

In [23]:

cars_data.describe()

Out[23]:

mpg cyl disp hp drat wt qsec v

count 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.00000

mean 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750 0.43750

std 6.026948 1.785922 123.938694 68.562868 0.534679 0.978457 1.786943 0.50401

min 10.400000 4.000000 71.100000 52.000000 2.760000 1.513000 14.500000 0.00000

25% 15.425000 4.000000 120.825000 96.500000 3.080000 2.581250 16.892500 0.00000

50% 19.200000 6.000000 196.300000 123.000000 3.695000 3.325000 17.710000 0.00000

75% 22.800000 8.000000 326.000000 180.000000 3.920000 3.610000 18.900000 1.00000

max 33.900000 8.000000 472.000000 335.000000 4.930000 5.424000 22.900000 1.00000

Insights/Inference

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 3/20


10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

In [24]:

cars_data.cyl.unique()

Out[24]:

array([6, 4, 8], dtype=int64)

In [25]:

cars_data.gear.unique()

Out[25]:

array([4, 3, 5], dtype=int64)

In [26]:

cars_data.carb.unique()

Out[26]:

array([4, 1, 2, 3, 6, 8], dtype=int64)

In [28]:

cars_data.groupby(by='am')['mpg'].mean().round(2)

Out[28]:

am

0 17.15

1 24.39

Name: mpg, dtype: float64

Inferences
1. On an average, automatic cars are giving 17.15mpg and manual cars are giving 24.39mpg.

2. On an average, 4 cyl cars are giving 26.66mpg, 6cyl cars are giving 19.74mpg and 8cyl are giving
15mpg.

3. On an average, automatic car with V-shaped engine is giving 15mpg and manual cars with V-shaped
is giving 19mpg and also automatic cars with straight-shaped engine is giving 20.74mpg and manual
cars with straight-shaped engine is giving 28mpg. So it is evident that manual cars with straight-shaped
engine is giving more mileage compared to all other divisions.

In [30]:

cars_data.groupby(by = 'cyl')['mpg'].mean().round(2)

Out[30]:

cyl

4 26.66

6 19.74

8 15.10

Name: mpg, dtype: float64

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 4/20


10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

In [31]:

cars_data.groupby(by = ['vs','am'])['mpg'].mean().round(2)

Out[31]:

vs am

0 0 15.05

1 19.75

1 0 20.74

1 28.37

Name: mpg, dtype: float64

In [32]:

pd.crosstab(cars_data['cyl'],cars_data['gear']) #Frequency table

Out[32]:

gear 3 4 5

cyl

4 1 8 2

6 2 4 1

8 12 0 2

1. Univariate Analysis

To understand the quality of 1 variable.

In [33]:

cars_data.head()

Out[33]:

mpg cyl disp hp drat wt qsec vs am gear carb

0 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4

1 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4

2 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1

3 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1

4 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 5/20


10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

In [39]:

cars_data.describe()

Out[39]:

mpg cyl disp hp drat wt qsec v

count 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.00000

mean 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750 0.43750

std 6.026948 1.785922 123.938694 68.562868 0.534679 0.978457 1.786943 0.50401

min 10.400000 4.000000 71.100000 52.000000 2.760000 1.513000 14.500000 0.00000

25% 15.425000 4.000000 120.825000 96.500000 3.080000 2.581250 16.892500 0.00000

50% 19.200000 6.000000 196.300000 123.000000 3.695000 3.325000 17.710000 0.00000

75% 22.800000 8.000000 326.000000 180.000000 3.920000 3.610000 18.900000 1.00000

max 33.900000 8.000000 472.000000 335.000000 4.930000 5.424000 22.900000 1.00000

In [60]:

cars_data.gear.unique()

Out[60]:

array([4, 3, 5], dtype=int64)

In [63]:

cars_data.gear.value_counts()

Out[63]:

3 15

4 12

5 5

Name: gear, dtype: int64

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 6/20


10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

In [74]:

plt.figure(figsize = (10,10))
plt.pie(x = cars_data.gear.value_counts(),data=cars_data,labels=[3,4,5],explode=[0.02,0.02,
#Note: In pie-chart, pick up a discrete data and pass with its value and counts
plt.show()

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 7/20


10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

In [38]:

plt.hist(x = 'mpg',data=cars_data) #Always pick 1 continous to understand the frequency dis


plt.title('MPG Distribution')
plt.show()

In [40]:

plt.hist(x = 'cyl',data=cars_data) #Wring datatype for histogram


plt.title('CYL Distribution')
plt.show()

In [35]:

cars_data.cyl.value_counts()

Out[35]:

8 14

4 11

6 7

Name: cyl, dtype: int64

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 8/20


10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

In [57]:

sns.countplot(x='cyl',y=None,data=cars_data)

Out[57]:

<AxesSubplot:xlabel='cyl', ylabel='count'>

Boxplot - is used to detect the outliers --> the most extreme points and the least points

In [92]:

cars_data.describe()

Out[92]:

mpg cyl disp hp drat wt qsec v

count 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.000000 32.00000

mean 20.090625 6.187500 230.721875 146.687500 3.596563 3.217250 17.848750 0.43750

std 6.026948 1.785922 123.938694 68.562868 0.534679 0.978457 1.786943 0.50401

min 10.400000 4.000000 71.100000 52.000000 2.760000 1.513000 14.500000 0.00000

25% 15.425000 4.000000 120.825000 96.500000 3.080000 2.581250 16.892500 0.00000

50% 19.200000 6.000000 196.300000 123.000000 3.695000 3.325000 17.710000 0.00000

75% 22.800000 8.000000 326.000000 180.000000 3.920000 3.610000 18.900000 1.00000

max 33.900000 8.000000 472.000000 335.000000 4.930000 5.424000 22.900000 1.00000

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 9/20


10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

In [91]:

plt.boxplot(x = 'hp',data = cars_data) #Continous data


plt.show()

2. Bivariate Analysis

Barplot

In [43]:

cars_data.groupby(by = 'cyl')['mpg'].mean().round(2)

Out[43]:

cyl

4 26.66

6 19.74

8 15.10

Name: mpg, dtype: float64

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 10/20


10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

In [42]:

plt.bar(x = 'cyl',height = 'mpg',data=cars_data)


plt.show() #Wrong calculation

In [47]:

cars_data.groupby(by = 'cyl')['mpg'].mean().round(2)

Out[47]:

cyl

4 26.66

6 19.74

8 15.10

Name: mpg, dtype: float64

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 11/20


10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

In [59]:

plt.figure(figsize=(6,5))
sns.barplot(x='cyl',y='mpg',data=cars_data,)
plt.title('Cyl Vs MPG',size = 20)
plt.show()

Scatter Plot

Is used to check the linear association/relationship between two variable. Note: 2variables it need to be
continous.

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 12/20


10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

In [75]:

cars_data.head()

Out[75]:

mpg cyl disp hp drat wt qsec vs am gear carb

0 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4

1 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4

2 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1

3 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1

4 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2

In [76]:

plt.scatter(x = 'hp', y = 'mpg', data = cars_data)

Out[76]:

<matplotlib.collections.PathCollection at 0x27a630fc700>

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 13/20


10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

In [77]:

plt.scatter(x = 'drat', y = 'mpg', data = cars_data)

Out[77]:

<matplotlib.collections.PathCollection at 0x27a63361d00>

In [78]:

sns.scatterplot(x = 'drat', y = 'mpg', data = cars_data)

Out[78]:

<AxesSubplot:xlabel='drat', ylabel='mpg'>

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 14/20


10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

In [79]:

sns.lmplot(x = 'drat', y = 'mpg', data = cars_data)

Out[79]:

<seaborn.axisgrid.FacetGrid at 0x27a63540d60>

Boxplot as bivariate analysis

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 15/20


10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

In [94]:

sns.boxplot(x='cyl',y='mpg',data=cars_data)
plt.show()

Correlation Matrix

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 16/20


10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

In [87]:

corr = cars_data.corr().round(2)
corr

Out[87]:

mpg cyl disp hp drat wt qsec vs am gear carb

mpg 1.00 -0.85 -0.85 -0.78 0.68 -0.87 0.42 0.66 0.60 0.48 -0.55

cyl -0.85 1.00 0.90 0.83 -0.70 0.78 -0.59 -0.81 -0.52 -0.49 0.53

disp -0.85 0.90 1.00 0.79 -0.71 0.89 -0.43 -0.71 -0.59 -0.56 0.39

hp -0.78 0.83 0.79 1.00 -0.45 0.66 -0.71 -0.72 -0.24 -0.13 0.75

drat 0.68 -0.70 -0.71 -0.45 1.00 -0.71 0.09 0.44 0.71 0.70 -0.09

wt -0.87 0.78 0.89 0.66 -0.71 1.00 -0.17 -0.55 -0.69 -0.58 0.43

qsec 0.42 -0.59 -0.43 -0.71 0.09 -0.17 1.00 0.74 -0.23 -0.21 -0.66

vs 0.66 -0.81 -0.71 -0.72 0.44 -0.55 0.74 1.00 0.17 0.21 -0.57

am 0.60 -0.52 -0.59 -0.24 0.71 -0.69 -0.23 0.17 1.00 0.79 0.06

gear 0.48 -0.49 -0.56 -0.13 0.70 -0.58 -0.21 0.21 0.79 1.00 0.27

carb -0.55 0.53 0.39 0.75 -0.09 0.43 -0.66 -0.57 0.06 0.27 1.00

3. Multivariate Analysis

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 17/20


10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

In [83]:

sns.pairplot(data = cars_data)

Out[83]:

<seaborn.axisgrid.PairGrid at 0x27a63520e20>

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 18/20


10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

In [89]:

plt.figure(figsize = (15,8))
sns.heatmap(data = corr,annot=True)

Out[89]:

<AxesSubplot:>

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 19/20


10/11/21, 2:47 PM Data Vizualization - Jupyter Notebook

In [95]:

import plotly.express as px
# This dataframe has 244 lines, but 4 distinct values for `day`
df = px.data.tips()
fig = px.pie(df, values='tip', names='day')
fig.show()

In [ ]:

localhost:8888/notebooks/Data science/Data Vizualization.ipynb 20/20

You might also like