Professional Documents
Culture Documents
April 5, 2024
display(data.head())
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[1], line 2
1 # Count the number of each gender in the dataset
----> 2 gender_distribution = data['GENDER'].value_counts()
4 # Display the gender distribution
5 display(gender_distribution)
---------------------------------------------------------------------------
1
KeyError Traceback (most recent call last)
File /usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py:3805,␣
↪in Index.get_loc(self, key)
3804 try:
-> 3805 return self._engine.get_loc(casted_key)
3806 except KeyError as err:
KeyError: 'GENDER'
The above exception was the direct cause of the following exception:
File /usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py:3812,␣
↪in Index.get_loc(self, key)
2
3816 # the TypeError.
3817 self._check_indexing_error(key)
KeyError: 'GENDER'
# It seems there was an error with the column name. Let's print the column␣
↪names to understand the issue.
print(data.columns)
# Assuming the column name might have leading or trailing spaces or different␣
↪capitalization, we will standardize the column names and then count the␣
↪gender distribution.
data.columns = data.columns.str.strip().str.upper()
Index(['Unnamed: 0', 'Unnamed: 1', 'Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4',
'Unnamed: 5', 'Unnamed: 6'],
dtype='object')
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File /usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py:3805,␣
↪in Index.get_loc(self, key)
3804 try:
-> 3805 return self._engine.get_loc(casted_key)
3806 except KeyError as err:
3
KeyError: 'GENDER'
The above exception was the direct cause of the following exception:
File /usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py:3812,␣
↪in Index.get_loc(self, key)
KeyError: 'GENDER'
4
# Create a bar chart for gender distribution using Plotly
gender_chart = go.Figure(go.Bar(x=gender_distribution.index,␣
↪y=gender_distribution.values, marker_color=['blue', 'pink']))
↪paper_bgcolor='#111', font=dict(color='#7FDBFF'))
gender_chart.show()
50
40
Count
30
20
10
0
MALE FEMALE GENDER
Gender
Loading [MathJax]/extensions/MathMenu.js
The gender distribution in the dataset was successfully analyzed and visualized. Here are the key
points:
• The dataset contains information on two genders: Male (M) and Female (F).
• A bar chart was created to visually represent the distribution of genders.
• The exact counts of each gender were not explicitly mentioned in the summary, but they were
visually represented in the bar chart.
• The chart utilized colors (blue for males, pink for females) to differentiate between the genders.
• The visualization included titles and labels for clarity.
5
# Display the mean performance for each gender
display(gender_performance)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
File /usr/local/lib/python3.11/site-packages/pandas/core/groupby/groupby.py:
↪1942, in GroupBy._agg_py_fallback(self, how, values, ndim, alt)
1941 try:
-> 1942 res_values = self._grouper.agg_series(ser, alt, preserve_dtype=True)
1943 except Exception as err:
File /usr/local/lib/python3.11/site-packages/pandas/core/groupby/groupby.py:
↪2454, in GroupBy.mean.<locals>.<lambda>(x)
2451 else:
2452 result = self._cython_agg_general(
2453 "mean",
-> 2454 alt=lambda x:␣
↪Series(x, copy=False).mean(numeric_only=numeric_only),
2455 numeric_only=numeric_only,
2456 )
2457 return result.__finalize__(self.obj, method="groupby")
6
12410 def mean(
12411 self,
12412 axis: Axis | None = 0,
(…)
12415 **kwargs,
12416 ) -> Series | float:
> 12417 return self._stat_function(
12418 "mean", nanops.nanmean, axis, skipna, numeric_only, **kwargs
12419 )
146 else:
--> 147 result = alt(values, axis=axis, skipna=skipna, **kwds)
149 return result
7
1702 try:
The above exception was the direct cause of the following exception:
File /usr/local/lib/python3.11/site-packages/pandas/core/groupby/groupby.py:
↪2452, in GroupBy.mean(self, numeric_only, engine, engine_kwargs)
File /usr/local/lib/python3.11/site-packages/pandas/core/groupby/groupby.py:
↪1998, in GroupBy._cython_agg_general(self, how, alt, numeric_only, min_count,␣
↪**kwargs)
File /usr/local/lib/python3.11/site-packages/pandas/core/internals/base.py:367,␣
↪in SingleDataManager.grouped_reduce(self, func)
8
File /usr/local/lib/python3.11/site-packages/pandas/core/groupby/groupby.py:
↪1995, in GroupBy._cython_agg_general.<locals>.array_func(values)
File /usr/local/lib/python3.11/site-packages/pandas/core/groupby/groupby.py:
↪1946, in GroupBy._agg_py_fallback(self, how, values, ndim, alt)
↪paper_bgcolor='#111', font=dict(color='#7FDBFF'))
gender_performance_chart.show()
9
Mean Performance by Gender
50
40
Mean Total Score
30
20
10
0
FEMALE GENDER MALE
Gender
The analysis of performance differences between genders revealed the following key points:
• Performance was evaluated based on the total score, calculated as the sum of CA (Continuous
Assessment) and EXAM scores.
• Both CA and EXAM scores were converted to numeric values to ensure accurate calculations.
• The mean performance for each gender was calculated and visualized in a bar chart.
• The bar chart utilized colors (blue for males, pink for females) to differentiate between the
genders, and included titles and labels for clarity.
• The exact mean performance values for each gender were not explicitly mentioned in the
summary, but they were visually represented in the bar chart.
color_discrete_map={'MALE': 'blue',␣
↪'FEMALE': 'pink'})
10
gender_performance_boxplot.update_layout(plot_bgcolor='#111',␣
↪paper_bgcolor='#111', font=dict(color='#7FDBFF'))
gender_performance_boxplot.show()
Gender
80
GENDER
MALE
FEMALE
70
Total Score
60
50
40
MALE FEMALE
Gender
The comparison of performance distribution between genders was visualized using a box plot,
revealing key insights:
• The box plot shows the spread of total scores (sum of CA and EXAM) for each gender.
• Colors were used to distinguish genders (blue for males, pink for females), enhancing visual
clarity.
• The plot includes median, quartiles, and potential outliers, providing a comprehensive view
of the performance distribution.
• Specific numerical details such as exact quartiles or outliers were not mentioned, but these
are represented visually in the plot.
• The visualization was designed with a dark theme, using color coding for text and background
for better contrast.
11
correlation_plot = px.scatter(data, x='CA', y='EXAM', color='GENDER',␣
↪symbol='GENDER',
correlation_plot.update_layout(plot_bgcolor='#111', paper_bgcolor='#111',␣
↪font=dict(color='#7FDBFF'))
correlation_plot.show()
60 GENDER
GENDER
55 MALE
FEMALE
50
45
Exam Score
40
35
30
25
20
5 10 15 20 25
The investigation of correlations between CA (Continuous Assessment) and EXAM scores was
conducted through a scatter plot, revealing several insights:
• The scatter plot visualizes the relationship between CA and EXAM scores for each gender,
with colors distinguishing genders (blue for males, pink for females).
• Symbols were used to differentiate genders within the plot, enhancing the visual distinction.
• The plot includes titles and labels for clarity, explaining what CA and EXAM scores represent.
• While the plot visually suggests a relationship between CA and EXAM scores, specific cor-
relation coefficients were not calculated in this summary.
• The visualization was designed with a dark theme, using color coding for text and background
12
for better contrast.
exam_histogram = go.Figure()
exam_histogram.add_trace(go.Histogram(x=data['EXAM'], name='EXAM Scores',␣
↪marker_color='lightgreen'))
↪paper_bgcolor='#111', font=dict(color='#7FDBFF'))
15
Count
10
0
5 10 15 20 25
CA Score
13
Distribution of Exam Scores
25
20
15
Count
10
0
20 30 40 50
Exam Score
The exploration of the distribution of CA (Continuous Assessment) and EXAM scores was con-
ducted through histograms, revealing distinct insights for each:
• CA Scores Distribution:
– Visualized using a light blue histogram.
– The histogram shows the frequency of various CA scores across the dataset.
– The layout includes titles and labels for clarity, with a dark theme for better visual
contrast.
• EXAM Scores Distribution:
– Visualized using a light green histogram.
– Similar to the CA scores, this histogram displays the frequency of various EXAM scores.
– The design and layout follow the same theme as the CA scores histogram for consistency.
14
# Display the mean total score for each gender
display(gender_mean_performance)
GENDER
FEMALE 56.181818
GENDER NaN
MALE 51.400000
Name: TOTAL, dtype: float64
The comparison of average performance by gender yielded the following results:
• Female: The average total score for females is approximately 56.18.
• Male: The average total score for males is approximately 51.40.
• An entry labeled GENDER with a NaN value was also observed, indicating a possible data
entry error or missing gender information for some records.
This analysis suggests that, on average, females have a higher total score compared to males in this
dataset.
color_discrete_map={'MALE':␣
↪'blue', 'FEMALE': 'pink'})
performance_distribution_by_gender.update_layout(plot_bgcolor='#111',␣
↪paper_bgcolor='#111', font=dict(color='#7FDBFF'))
performance_distribution_by_gender.show()
15
Performance Distribution by Gender
Gender
80
GENDER
MALE
FEMALE
70
Total Score
60
50
40
MALE FEMALE
Gender
test the null hypothesis that there is no difference between male and female students
performance
[12]: from scipy.stats import ttest_ind
# Perform an independent t-test between male and female students' total scores
ttest_results = ttest_ind(data_male.dropna(), data_female.dropna())
T-statistic: -1.8084
P-value: 0.0746
The independent t-test conducted to compare the performance (total scores) between male and
female students yielded the following results:
• T-statistic: -1.8084, indicating the direction and magnitude of the difference between the
group means.
16
• P-value: 0.0746, which suggests that the difference in mean performance between male and
female students is not statistically significant at the conventional 0.05 level.
Based on these results, we fail to reject the null hypothesis that there is no difference in performance
between male and female students. This implies that any observed difference in mean performance
between genders in this dataset is not statistically significant.
17