Professional Documents
Culture Documents
'Mid-Term Scores': [12, 9, 17, 19, 20, 11, 15, 12, 9,␣
↪4, 20, 19, 17,19, 18],
'Grade':
↪['B','C','A','A','A','B','B','B','C','F','A','A','A','A','A'],
↪'Male', 'Male'],
'Final Score':[42, 23, 64, 88, 30, 86, 95, 78, 75, 43,␣
↪20, 43, 53, 64, 21]})
[23]: dataframe
1
12 13 Art 17 A Male 53
13 14 Mathematics 19 A Male 64
14 15 History 18 A Male 21
[5]: #get the value counts for each category in the Subject variable
dataframe['Subject'].value_counts()
[5]: Art 6
Mathematics 5
Science 2
History 2
Name: Subject, dtype: int64
[6]: #from the above include the labels in descending order of count
labels=['Art','Mathematics','History','Science']
x=pd.value_counts(dataframe["Subject"])
y=plt.pie(x,labels=labels,autopct='%1.2f%%')
plt.legend(labels,loc="best")
plt.axis('equal')
plt.title("Subject")
plt.show()
2
1.2 2. Bar/Column chart
[7]: dataframe["Subject"].value_counts(normalize=False).plot.bar(title="Subject")␣
↪#if you want o normalize data can use normalize=True
[7]: <AxesSubplot:title={'center':'Subject'}>
3
1.3 3. Simple Boxplot (Uni Variate)
[8]: b_plot = dataframe.boxplot(column = 'Mid-Term Scores', color = 'blue' )
b_plot.plot()
plt.show()
4
1.4 4. Boxplot with 2 variables (bivariate)
[9]: sns.catplot(x="Subject",y='Final Score',kind="box",data=dataframe)
5
1.5 5. Boxplot with 3 variables (trivariate)
[10]: sns.boxplot(x="Subject",y='Final Score', hue='Gender', data=dataframe)
6
1.6 6. Scatter plot
[11]: sns.scatterplot(data=dataframe, x="Mid-Term Scores", y="Final Score")
7
1.7 7. Scatter plot with 3 variables
[12]: sns.scatterplot(data=dataframe, x="Mid-Term Scores", y="Final Score",␣
↪hue="Gender")
8
1.8 8. Histogram
[13]: sns.histplot(dataframe['Mid-Term Scores'])
plt.title('Mid-Term Scores')
plt.xlabel('Mid-Term Scores')
plt.ylabel("Frequency")
plt.show()
9
2 Contingency table
[25]: data_crosstab = pd.crosstab([dataframe.Grade, dataframe.Gender],dataframe.
↪Subject, margins = True)
print(data_crosstab)
10
3 Exploratory Statistical Analysis- Summary Statistics
[15]: #defining data types
dataframe.dtypes
df.describe()
dataframe.dtypes
11
[18]: Subject Grade Gender
count 15 15 15
unique 4 4 2
top Art A Male
freq 6 8 9
[26]: dataframe.describe(include='all')
[ ]:
12