Professional Documents
Culture Documents
import pandas as pd
import seaborn as sns
#See the chart in the jupyter notebook
#import matplotlib.pyplot as plt
In [5]:
month = ['Jan','Feb','Mar','Apr','May','Jun','July']
sales = [30000,25000,50000,45000,42000,30000,33000]
In [6]:
print(month)
print(sales)
In [16]:
plt.figure(figsize=(15,6))
plt.plot(month,sales)
plt.xlabel('Month')
plt.ylabel('Sales')
plt.title('Monthwise Sales')
plt.show()
In [20]:
Out[20]:
2. Perform Initial analysis and try to get 2 insights from this data.
In [21]:
cars_data.shape
Out[21]:
(32, 11)
In [22]:
cars_data.isna().sum()
Out[22]:
mpg 0
cyl 0
disp 0
hp 0
drat 0
wt 0
qsec 0
vs 0
am 0
gear 0
carb 0
dtype: int64
In [23]:
cars_data.describe()
Out[23]:
Insights/Inference
In [24]:
cars_data.cyl.unique()
Out[24]:
In [25]:
cars_data.gear.unique()
Out[25]:
In [26]:
cars_data.carb.unique()
Out[26]:
In [28]:
cars_data.groupby(by='am')['mpg'].mean().round(2)
Out[28]:
am
0 17.15
1 24.39
Inferences
1. On an average, automatic cars are giving 17.15mpg and manual cars are giving 24.39mpg.
2. On an average, 4 cyl cars are giving 26.66mpg, 6cyl cars are giving 19.74mpg and 8cyl are giving
15mpg.
3. On an average, automatic car with V-shaped engine is giving 15mpg and manual cars with V-shaped
is giving 19mpg and also automatic cars with straight-shaped engine is giving 20.74mpg and manual
cars with straight-shaped engine is giving 28mpg. So it is evident that manual cars with straight-shaped
engine is giving more mileage compared to all other divisions.
In [30]:
cars_data.groupby(by = 'cyl')['mpg'].mean().round(2)
Out[30]:
cyl
4 26.66
6 19.74
8 15.10
In [31]:
cars_data.groupby(by = ['vs','am'])['mpg'].mean().round(2)
Out[31]:
vs am
0 0 15.05
1 19.75
1 0 20.74
1 28.37
In [32]:
Out[32]:
gear 3 4 5
cyl
4 1 8 2
6 2 4 1
8 12 0 2
1. Univariate Analysis
In [33]:
cars_data.head()
Out[33]:
In [39]:
cars_data.describe()
Out[39]:
In [60]:
cars_data.gear.unique()
Out[60]:
In [63]:
cars_data.gear.value_counts()
Out[63]:
3 15
4 12
5 5
In [74]:
plt.figure(figsize = (10,10))
plt.pie(x = cars_data.gear.value_counts(),data=cars_data,labels=[3,4,5],explode=[0.02,0.02,
#Note: In pie-chart, pick up a discrete data and pass with its value and counts
plt.show()
In [38]:
In [40]:
In [35]:
cars_data.cyl.value_counts()
Out[35]:
8 14
4 11
6 7
In [57]:
sns.countplot(x='cyl',y=None,data=cars_data)
Out[57]:
<AxesSubplot:xlabel='cyl', ylabel='count'>
Boxplot - is used to detect the outliers --> the most extreme points and the least points
In [92]:
cars_data.describe()
Out[92]:
In [91]:
2. Bivariate Analysis
Barplot
In [43]:
cars_data.groupby(by = 'cyl')['mpg'].mean().round(2)
Out[43]:
cyl
4 26.66
6 19.74
8 15.10
In [42]:
In [47]:
cars_data.groupby(by = 'cyl')['mpg'].mean().round(2)
Out[47]:
cyl
4 26.66
6 19.74
8 15.10
In [59]:
plt.figure(figsize=(6,5))
sns.barplot(x='cyl',y='mpg',data=cars_data,)
plt.title('Cyl Vs MPG',size = 20)
plt.show()
Scatter Plot
Is used to check the linear association/relationship between two variable. Note: 2variables it need to be
continous.
In [75]:
cars_data.head()
Out[75]:
In [76]:
Out[76]:
<matplotlib.collections.PathCollection at 0x27a630fc700>
In [77]:
Out[77]:
<matplotlib.collections.PathCollection at 0x27a63361d00>
In [78]:
Out[78]:
<AxesSubplot:xlabel='drat', ylabel='mpg'>
In [79]:
Out[79]:
<seaborn.axisgrid.FacetGrid at 0x27a63540d60>
In [94]:
sns.boxplot(x='cyl',y='mpg',data=cars_data)
plt.show()
Correlation Matrix
In [87]:
corr = cars_data.corr().round(2)
corr
Out[87]:
mpg 1.00 -0.85 -0.85 -0.78 0.68 -0.87 0.42 0.66 0.60 0.48 -0.55
cyl -0.85 1.00 0.90 0.83 -0.70 0.78 -0.59 -0.81 -0.52 -0.49 0.53
disp -0.85 0.90 1.00 0.79 -0.71 0.89 -0.43 -0.71 -0.59 -0.56 0.39
hp -0.78 0.83 0.79 1.00 -0.45 0.66 -0.71 -0.72 -0.24 -0.13 0.75
drat 0.68 -0.70 -0.71 -0.45 1.00 -0.71 0.09 0.44 0.71 0.70 -0.09
wt -0.87 0.78 0.89 0.66 -0.71 1.00 -0.17 -0.55 -0.69 -0.58 0.43
qsec 0.42 -0.59 -0.43 -0.71 0.09 -0.17 1.00 0.74 -0.23 -0.21 -0.66
vs 0.66 -0.81 -0.71 -0.72 0.44 -0.55 0.74 1.00 0.17 0.21 -0.57
am 0.60 -0.52 -0.59 -0.24 0.71 -0.69 -0.23 0.17 1.00 0.79 0.06
gear 0.48 -0.49 -0.56 -0.13 0.70 -0.58 -0.21 0.21 0.79 1.00 0.27
carb -0.55 0.53 0.39 0.75 -0.09 0.43 -0.66 -0.57 0.06 0.27 1.00
3. Multivariate Analysis
In [83]:
sns.pairplot(data = cars_data)
Out[83]:
<seaborn.axisgrid.PairGrid at 0x27a63520e20>
In [89]:
plt.figure(figsize = (15,8))
sns.heatmap(data = corr,annot=True)
Out[89]:
<AxesSubplot:>
In [95]:
import plotly.express as px
# This dataframe has 244 lines, but 4 distinct values for `day`
df = px.data.tips()
fig = px.pie(df, values='tip', names='day')
fig.show()
In [ ]: