You are on page 1of 40

INDEX

S. No. Detailed Statement


1 Create 2 numpy arrays with 5 elements using arange() and linspace() and display the
implement the concept of slicing on them.
2 Create two 2-d arrays and perform addition, subtraction and multiplication on these
arrays. Print the value of bordered elements of both matrices.
3 Create two pandas series from a dictionary of values and an ndarray and display the
values from 2nd to 5th index. Print the index, minimum and maximum values in the
first series.
4 Create a Series and print all the elements that are above 75th percentile. Print
minimum ,maximum and sum of series using aggregate()
5 Series objects Temp1, temp2, temp3, temp 4 stores the temperature of
days of week 1, week 2, week 3, week 4. Write a script to:-
a. Print average temperature per week
b. Print average temperature of entire month
6 Create a series containing 2 NaN values. Perform check for null values and then
replace the null values with 0.perform any 5 statistical functions on the series.
7 Two Series object, Population stores the details of four metro cities of
India and another object AvgIncome store the total average income
reported in four years in these cities. Calculate income per capita for each
of these metro cities.
8 Given two series S1 and S2.
S1 S2
A 39 A 10
B 41 B 10
C 42 D 10
D 44 F 10
Find the output for following python pandas statements?
a. S1[ : 2]*100
b. S1 * S2
c. S2[ : : -1]*10
9 Write a program to create a Series having 10 random numbers in the range of 10 and
20
10 Consider a series object s10 that stores the number of students in each section of
class 12 as shown below. First two sections have been given task for selling tickets @
Rs.100/- per ticket as a part of social experiment. Write code to create the series and
display how much section A and B have collected. A-39, B- 31, C- 32, D- 34, E- 35
11 Write a program to create a DataFrame to store weight, age and name of
three people. Print the DataFrame and its transpose. Add 5 rows in the dataframe
through code. Rename the Weight column as Wgt. And then print the index names,
column names and total amount of data. Sort the dataframe on the basis of age.
12 Create a DataFrame having age, name, weight of five students. Print the dataframe
using head(). Modify the weight of student in first and 4th row. Display only the
weight of first and fourth rows before and after modification.
13 Create a DataFrame based on E-Commerce data and generate mean, mode, median.
14 Write a Program to create a CSV file with student data containing 10 rows. create its
DataFrame and use describe() to display its statistics. Write all the steps and
definition of all the statistical functions displayed.
15 Consider the DataFrame QtrSales where each row contains the item category, item
name and expenditure and group the rows by category, and print the average
expenditure per category
16 Write a program to implement pivot() and pivot-table() on a DataFrame
17 Write a program to find mean absolute deviation on a DataFrame.
18 Create a DataFrame based on employee data and generate quartile and variance.
19 Program to implement Skewness on Random data.
20 Create a DateFrame on any Data and compute statistical function of
Kurtosis.
21 CASE STUDY ON :CRIME_BY_STATE_RT.CSV
Que 1 Create 2 numpy arrays with 5 elements using arange() and linspace() and display the implement the concept
of slicing on them.
import numpy as np
n1=np.arange(5)
n2=np.linspace(3,5,5)
print(n1)
print(n2)
print()
print(n1[1:3])
print(n2[1:3])
Que 2 Create two 2-d arrays and perform addition, subtraction and multiplication on these arrays. Print the value
of bordered elements of both matrices.

m1=[[11,12,13],
[14,15,16],
[17,18,19]]
m2=[[1,2,3],
[4,5,6],
[7,8,9]]
add=[[0,0,0],
[0,0,0],
[0,0,0]]
sub=[[0,0,0],
[0,0,0],
[0,0,0]]
mul=[[0,0,0],
[0,0,0],
[0,0,0]]
print('Addition :- ')
for i in range(len(m1)):
for j in range(len(m1[0])):
add[i][j] = m1[i][j] + m2[i][j]
print(add)
print('Subtraction :- ')
for i in range(len(m1)):
for j in range(len(m1[0])):
sub[i][j] = m1[i][j] - m2[i][j]
print(sub)
print('Multiplication :- ')
for i in range(len(m1)):
for j in range(len(m2[0])):
for k in range(len(m2)):
mul[i][j] = m1[i][k] - m2[k][j]
print(mul)
Que 3 Create two pandas series from a dictionary of values and an ndarray and display the values from
2nd to 5th index. Print the index, minimum and maximum values in the first series.

import pandas as pd
import numpy as np
dict1 = {'Devaansh': 100,'Harshil': 200,'Pryank': 300,'Raghav': 400,'Keshav': 500,'Madhav':600}
arr1 = np.array([300,400,500, 600, 700, 800])
Seriesdict1 = pd.Series(dict1)
Seriesarr1 = pd.Series(arr1)
print("Series from Dictionary:")
print(Seriesdict1[2:6])
print("\nSeries from ndarray:")
print(Seriesarr1[2:6])
print(Seriesdict1.index)
print(Seriesdict1.min())
print(Seriesdict1.max())
Que 4 Create a Series and print all the elements that are above 75th percentile. Print
minimum ,maximum and sum of series using aggregate()

import pandas as pd
import numpy as np
list1 = [10, 25, 55, 75, 90, 115, 135]
Se = pd.Series(list1)
print(Se)
print("Series Above 75 Percentile \n")
PERCNT = np.percentile(Se, 75)
Series75 = Se[Se > PERCNT]
print(Series75)
Que 5 Series objects Temp1, temp2, temp3, temp 4 stores the temperature of
days of week 1, week 2, week 3, week 4. Write a script to:-
a. Print average temperature per week
b. Print average temperature of entire month

import pandas as pd
avgtem = [28,31,26,28,31,34,37,
28,33,25.0,35,35.4,29,27,
24,30,29,39,37,39,40,
27,36,34,30,22,25,27,
28,30,25]
T1 = pd.Series(avgtem[:7])
T2 = pd.Series(avgtem[8:15])
T3 = pd.Series(avgtem[16:23])
T4 = pd.Series(avgtem[23:30])
#A
print("---------AVERAGE TEMPERATURE OF WEEKS---------\n\n")
print('Average Temp of Week 1 : ', T1.sum()/7, "\n")
print('Average Temp of Week 2 : ', T2.sum()/7, "\n")
print('Average Temp of Week 3 : ', T3.sum()/7, "\n")
print('Average Temp of Week 4 : ', T4.sum()/7, "\n")
print("\n")
#B
Sum = T1.sum() + T2.sum() + T3.sum() + T4.sum()
print("---------AVERAGE TEMPERATURE OF MONTH---------\n\n")
print('Average temp of entire month : ', Sum/len(avgtem))
Que 6 Create a series containing 2 NaN values. Perform check for null values and then replace the null
values with 0.perform any 5 statistical functions on the series.

import pandas as pd
import numpy as np

s1 = pd.Series([1, 2, np.nan, 4, np.nan, 6])

null_check = s1.isnull()
print("Null values check:\n", null_check)

s1 = s1.fillna(0)
print("\nSeries with null values replaced:\n", s1)

print("\nStatistical functions:")
print("Mean:", s1.mean())
print("Median:", s1.median())
print("Standard Deviation:", s1.std())
print("Minimum:", s1.min())
print("Maximum:", s1.max())
Que 7 Two Series object, Population stores the details of four metro cities of India and another object AvgIncome
store the total average income reported in four years in these cities. Calculate income per capita for each
of these metro cities.

import pandas as pd
Population = {'Delhi': 20000000, 'Mumbai': 22000000, 'Kolkata':
15000000, 'Chennai': 10000000}
Series_Population = pd.Series(Population)
Income = {'Delhi': 1000000, 'Mumbai': 1200000, 'Kolkata': 900000,
'Chennai': 800000}
Series_Income = pd.Series(Income)
Per_Capita = Series_Income / Series_Population
print("-----Income per capita:-----\n")
print(Per_Capita)
Que 8 Given two series S1 and S2.
S1 S2
A 39 A 10
B 41 B 10
C 42 D 10
D 44 F 10
Find the output for following python pandas statements?
a. S1[ : 2]*100
b. S1 * S2
c. S2[ : : -1]*10

import pandas as pd

S1 = pd.Series([39, 41, 42, 44], index=['A', 'B', 'C', 'D'])


S2 = pd.Series([10, 10, 10,10], index=['A', 'B', 'D','F'])
print(S1[ : 2]*100 )
print(S1 * S2 )
print(S2[ : : -1]*10)
Que 9 Write a program to create a Series having 10 random numbers in the range of 10 and 20

import pandas as pd
import numpy as np

rn= np.random.randint(10, 21, size=10)

series = pd.Series(rn)

print(series)
Que 10 Consider a series object s10 that stores the number of students in each section of class 12 as
shown below. First two sections have been given task for selling tickets @ Rs.100/- per ticket as a part of
social experiment. Write code to create the series and display how much section A and B have collected.
A-39, B- 31, C- 32, D- 34, E- 35

import pandas as pd

s10 = pd.Series([39, 31, 32, 34, 35], index=['A', 'B', 'C', 'D', 'E'])

amount_A = s10['A'] * 100


amount_B = s10['B'] * 100

print("Amount collected by Section A: Rs.", amount_A)


print("Amount collected by Section B: Rs.", amount_B)
Que 11 Write a program to create a DataFrame to store weight, age and name of
three people. Print the DataFrame and its transpose. Add 5 rows in the dataframe through code. Rename
the Weight column as Wgt. And then print the index names, column names and total amount of data.
Sort the dataframe on the basis of age.

import pandas as pd

data = {'Weight': [70, 65, 75],


'Age': [25, 26, 28],
'Name': ['ram', 'Madhav', 'Shyam']}

df = pd.DataFrame(data)

print("DataFrame:")
print(df)

print("\nTranspose of DataFrame:")
print(df.transpose())

new_data = {'Weight': [72, 68, 80, 67, 74],


'Age': [26, 31, 27, 29, 33],
'Name': ['ram', 'krishna', 'kahna', 'radha', 'rani']}

df = df.append(pd.DataFrame(new_data), ignore_index=True)

df.rename(columns={'Weight': 'Wgt'}, inplace=True)

print(df.index.tolist())

print(df.columns.tolist())

print("\nTotal amount of data:", df.size)

df_sorted = df.sort_values('Age')
print(df_sorted)
Que 12 Create a DataFrame having age, name, weight of five students. Print the dataframe using head().
Modify the weight of student in first and 4th row. Display only the weight of first and fourth rows before
and after modification.

import pandas as pd

student_data = {

'Age': [20, 21, 19, 22, 20],

'Name': ['ram', 'krishna', 'kahna', 'radha', 'rani'],

'Weight': [60, 65, 55, 70, 62]

df = pd.DataFrame(student_data)

selected_weights = df.loc[[0, 3], 'Weight']

print(selected_weights)
Que 13 Create a DataFrame based on E-Commerce data and generate mean, mode, median.

import pandas as pd

DICT1 = { 'Student roll': [11, 12, 13, 14, 15],

'Gender': ['M', 'F', 'F', 'M', 'M'],

'Marks': [97.5, 91.5, 87.5, 93.5, 77.5] }

DF = pd.DataFrame(DICT1)

Me = DF['Marks'].mean()

Mo = DF['Marks'].mode().values[0]

Med = DF['Marks'].median()

print("-----------Statistical Measures-----------\n")

print("Mean : ", Me)

print("Mode : ", Mo)

print("Median : ", Med)


Que 14 Write a Program to create a CSV file with student data containing 10 rows. create its DataFrame
and use describe() to display its statistics. Write all the steps and definition of all the statistical functions
displayed.

from google.colab import drive

drive.mount('/content/drive')

import pandas as pd

df = pd.read_csv("/content/drive/MyDrive/Book1.csv")

print(df)

print(df.describe())
Que 15 Consider the DataFrame QtrSales where each row contains the item category, item name and
expenditure and group the rows by category, and print the average expenditure per category
import pandas as pd

QtrSales = pd.DataFrame({

'Category': ['Electronics', 'Clothing', 'Electronics', 'Grocery',

'Clothing', 'Grocery'],

'Item Name': ['Laptop', 'T-Shirt', 'Headphones', 'Fruits', 'Jeans',

'Snacks'],

'Expenditure': [1500, 30, 100, 50, 45, 20]})

average_expenditure =QtrSales.groupby('Category')['Expenditure'].mean()

print("Average Expenditure per Category:")

print(average_expenditure)
Que 16 Write a program to implement pivot() and pivot-table() on a DataFrame

import pandas as pd

df = pd.DataFrame({'Product' : ['Almonds', 'Pistachio', 'Cashew', 'Kishmish','Khurbani', 'cashew', 'Almonds',


'Pistachio'],

'Quantity' : [800, 500, 300, 400, 500, 900, 1100, 800],

'Amount' : [27000, 239000, 617000, 384000, 626000, 610000, 621000, 901000]})

print(df)

print(df.pivot(index ='Product', columns ='Quantity', values =['Amount', 'Product']))

print(df.pivot_table(index =['Quantity'], values =['Amount'], aggfunc ='sum'))


Que 17 Write a program to find mean absolute deviation on a DataFrame.

import pandas as pd

df = pd.DataFrame({"ram":[25,27,26,28,29],

"krishna":[23,20,24,26,22],

"kahna":[28,26,29,31,36],

"radha":[31,35,29,35,30]})

print(df)

print(df.mad(axis = 0))
Que 18 Create a DataFrame based on employee data and generate quartile and variance.

import pandas as pd

import numpy

emp=pd.DataFrame({"Name":['ram','krishna','kahna','radha','rani'],

"Age":[24,26,28,30,32],

"Salary":[2400000,2800000,3200000,3600000,2400000],

"Experience":['3Y','5Y','7Y','6Y','8Y']})

print(emp)

print(emp['Salary'].quantile([0.25, 0.5, 0.75]))

print(emp['Salary'].var())
Que 19 Program to implement Skewness on Random data.

From scipy.stats import skew

x = [55, 78, 65, 98, 97, 60, 67, 65, 83, 65]

print(skew(x, axis=0, bias=True))


Que 20 Create a DateFrame on any Data and compute statistical function of Kurtosis.

from scipy.stats import kurtosis


import pandas as pd
x = [55, 78, 65, 98, 97, 60, 67, 65, 83, 65]
print()
print(kurtosis(x, fisher=False))
Que 21 CASE STUDY ON :CRIME_BY_STATE_RT.CSV
crime_by_state_rt.zip
21.1: Find the 75% percentile for data
21.2: Apply aggregate function to find count, min , max and sum .
21.3: Perform group by on State and Year
21.4: Calculate the number of entries year wise
21.5: Create a pie chart for all states
21.6: Plot the no. of murders state-wise
21.7: Check the data type of all column
21.8: Check the robbery column for NULL values.
if exist replace with 0
21.9: Select the data set where robbery greater than 10 and year= 2021
21.10: describe () the dataset
21.11: Perform sorting on Arson and Robbery
21.12: Perform renaming on two columns

from google.colab import drive


drive.mount('/content/drive')
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("/content/drive/MyDrive/crime_by_state_rt.csv")
#21.1
percentile_75 = df['Murder'].quantile(0.75)
print("75th percentile:", percentile_75)
#21.2
aggregate_result = df['Murder'].agg(['count', 'min', 'max', 'sum'])
print("Aggregate result:", aggregate_result)
#21.3
grouped_data = df.groupby(['STATE/UT', 'Year']).size().reset_index(name='Count')
print("Grouped data:",grouped_data)
#21.4
entries_yearwise = df.groupby('Year').size().reset_index(name='Count')
print("Number of entries year-wise:",entries_yearwise)
#21.5
fig = plt.figure(figsize =(7, 7))
state_counts = df['STATE/UT'].value_counts()
plt.pie(state_counts, labels=state_counts.index, autopct='%1.1f%%')
plt.title("Crime Distribution by State")
plt.axis('equal')
plt.show()
#21.6
print()
print()
fig = plt.figure(figsize =(7, 7))
plt.bar(df['STATE/UT'], df['Murder'])
plt.xlabel('STATE/UT')
plt.ylabel('Number of Murders')
plt.title('Number of Murders by State')
plt.xticks(rotation=90)
plt.show()
#21.7
print("Data types of all columns:")
print(df.dtypes)
#21.8
print(df['Robbery'].fillna(0, inplace=True))
#21.9
selected_data = df[(df['Robbery'] > 10) & (df['Year'] == 2012)]
print("Selected dataset:",selected_data)
#21.10
dataset_description = df.describe()
print("Dataset description:",dataset_description)
#21.11
sorted_data = df.sort_values(by=['Arson', 'Robbery'])
print("Sorted data:",sorted_data)
#21.12
df.rename(columns={'Year': 'Year_Crime', 'Crime_Type': 'Type_of_Crime'}, inplace=True)
print("Renamed DataFrame:",df)
Que 22 CASE STUDY ON: investigating clinical data
investigating clinical data.csv
22.1: describe () the dataset
22.2: Perform renaming on two columns
22.3: Is there any relation between the age of the patient and having a delay on the date of
appointment.
22.4: Is there any relation between gender and delaying appointments? (Show by plotting bar graph)
22.5: Do people delay their appointments when they have no scholarship? (Show by plotting)
22.6: Is there any relationship between gender having scholarships and delaying the appointment?
(show by plotting)
22.7: Which gender has more appointments in which illness? (Show by plotting)
22.8: Does a person suffering from alcoholism tend to delay appointments? (Show by plotting)
22.9: Perform group by and count on neighborhood
22.10: Perform agg() function on dataset.

from google.colab import drive


drive.mount('/content/drive')
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv("/content/drive/MyDrive/investigating clinical data.csv")
#22.1
print( df.describe())
#22.2
df2=df.rename(columns ={'Age':'patient_age','AppointmentID':'Appointment'},inplace = False)
print(df2)
#22.3
df['ScheduledDay'] = pd.to_datetime(df['ScheduledDay'])
df['AppointmentDay'] = pd.to_datetime(df['AppointmentDay'])
df['Delay'] = (df['AppointmentDay'] - df['ScheduledDay']).dt.days
mean_delay_by_age = df.groupby('Age')['Delay'].mean()
plt.plot(mean_delay_by_age.index, mean_delay_by_age.values)
plt.xlabel('Age')
plt.ylabel('Mean Delay (Days)')
plt.title('Mean Delay by Age')
plt.show()
#22.4
df['ScheduledDay'] = pd.to_datetime(df['ScheduledDay'])
df['AppointmentDay'] = pd.to_datetime(df['AppointmentDay'])
df['Delay'] = (df['AppointmentDay'] - df['ScheduledDay']).dt.days
delay_counts_by_gender = df[df['Delay'] > 0].groupby('Gender').size()
delay_counts_by_gender.plot(kind='bar')
plt.xlabel('Gender')
plt.ylabel('Number of Delayed Appointments')
plt.title('Delayed Appointments by Gender')
plt.show()
#22.5
df['ScheduledDay'] = pd.to_datetime(df['ScheduledDay'])
df['AppointmentDay'] = pd.to_datetime(df['AppointmentDay'])
df['Delay'] = (df['AppointmentDay'] - df['ScheduledDay']).dt.days
delay_counts_by_scholarship = df[df['Scholarship'] == 0].groupby('Scholarship').size()
delay_counts_by_scholarship.plot(kind='bar')
plt.xticks([0], ['No Scholarship'])
plt.xlabel('Scholarship')
plt.ylabel('Number of Delayed Appointments')
plt.title('Delayed Appointments for People with No Scholarship')
plt.show()
#22.6
df['ScheduledDay'] = pd.to_datetime(df['ScheduledDay'])
df['AppointmentDay'] = pd.to_datetime(df['AppointmentDay'])
df['Delay'] = (df['AppointmentDay'] - df['ScheduledDay']).dt.days
delay_counts_by_gender_scholarship = df[df['Delay'] > 0].groupby(['Gender', 'Scholarship']).size().unstack()
delay_counts_by_gender_scholarship.plot(kind='bar', stacked=True)
plt.xlabel('Gender')
plt.ylabel('Number of Delayed Appointments')
plt.title('Delayed Appointments by Gender and Scholarship')
plt.legend(title='Scholarship', loc='upper right')
plt.show()
#22.7
appointments_by_gender_illness = df.groupby(['Gender', 'Hipertension', 'Diabetes', 'Alcoholism',
'Handcap']).size().unstack()
appointments_by_gender_illness.plot(kind='bar', stacked=True)
plt.xlabel('Gender')
plt.ylabel('Number of Appointments')
plt.title('Appointments by Gender and Illness')
plt.legend(title='Illness', loc='upper right')
plt.show()
#22.8
df['ScheduledDay'] = pd.to_datetime(df['ScheduledDay'])
df['AppointmentDay'] = pd.to_datetime(df['AppointmentDay'])
df['Delay'] = (df['AppointmentDay'] - df['ScheduledDay']).dt.days
mean_delay_by_alcoholism = df.groupby('Alcoholism')['Delay'].mean()
mean_delay_by_alcoholism.plot(kind='bar')
plt.xticks([0, 1], ['No Alcoholism', 'Alcoholism'])
plt.xlabel('Alcoholism')
plt.ylabel('Mean Delay (Days)')
plt.title('Mean Delay by Alcoholism Status')
plt.show()
#22.9
neighborhood_counts = df.groupby('Neighbourhood').size()
print(neighborhood_counts)
#22.10
result = df.agg(["sum","min","max","count"])
print(result)
Que 23 CASE STUDY ON: Credit card transaction
Credit card transactions - India - Simple.csv
23.1: Show the head and visualize the basic state for all columns.
23.2: Display pivot table for create summary for total amount spread
a) month b) city
23.3: Analyze the impact of gender to study consumer behavior
23.4: Check for NULL values in columns .If exist replace them with appropriate values.
23.5: Analyze the relationship type between expense type and amount.
23.6: Analyze the spending habits by city and gender.
23.7: implement Skewness and kurtosis on data
Draw plots along with appropriate question

from google.colab import drive


drive.mount('/content/drive')
import pandas as pd
import matplotlib.pyplot as plt
from scipy.stats import skew
from scipy.stats import kurtosis
df = pd.read_csv("/content/drive/MyDrive/Credit card transactions - India - Simple.csv")
#23.1
print(df.head(5))
print(df.dtypes)
basic = df.dtypes.value_counts()
l=['object','int64']
plt.pie(basic, labels=l, autopct='%1.1f%%')
plt.title("Basic State")
plt.axis('equal')
plt.show()
#23.2
month=df.pivot_table(index =['Date'], values =['Amount'], aggfunc ='sum')
print(month)
city=df.pivot_table(index =['City'], values =['Amount'], aggfunc ='sum')
print(city)
#23.3
gender_counts = df['Gender'].value_counts()
print(gender_counts)
average_purchase_by_gender = df.groupby('Gender')['Amount'].mean()
print(average_purchase_by_gender)
df.groupby('Gender')['Amount'].mean().plot(kind='bar')
plt.xlabel('Gender')
plt.ylabel('Average Purchase Amount')
plt.title('Average Purchase Amount by Gender')
plt.show()
#23.4
print(df.fillna(0, inplace=True))
#23.5
total_amount_by_expense = df.groupby('Exp Type')['Amount'].sum()
print(total_amount_by_expense)
df.groupby('Exp Type')['Amount'].sum().plot(kind='bar')
plt.xlabel('Expense Type')
plt.ylabel('Total Amount')
plt.title('Total Amount by Expense Type')
plt.show()
expense_stats = df.groupby('Exp Type')['Amount'].describe()
print(expense_stats)
# 23.6
total_spending_by_city_gender = df.groupby(['City', 'Gender'])['Amount'].sum()
print(total_spending_by_city_gender)
df.groupby(['City', 'Gender'])['Amount'].sum().unstack().plot(kind='bar')
plt.xlabel('City')
plt.ylabel('Total Spending')
plt.title('Total Spending by City and Gender')
plt.legend(title='Gender', loc='upper right')
plt.show()
spending_stats = df.groupby(['City', 'Gender'])['Amount'].describe()
print(spending_stats)
#23.7
kurtosis_result = df.kurtosis()
print("Kurtosis of the Data:")
print(kurtosis_result)
skewness_result = df.skew()
print("skewness of the Data:")
print(skewness_result)
Que 24 CASE STUDY ON PROJECT TOPIC

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Mount Google Drive


from google.colab import drive
drive.mount('/content/drive')

# Read the CSV file


df = pd.read_csv('/content/drive/MyDrive/Animelover.csv', encoding='cp1252')

# 24.1
print(df)

# 24.2
print(df.describe())

# 24.3
df_renamed = df.rename(columns={'id': 'anime_ID', 'member': 'men'})
print(df_renamed)

# 24.4
subset = df.iloc[2:5, 1:4]
print(subset)

# 24.5
# Convert non-numeric values in 'episodes' column to NaN
df['episodes'] = pd.to_numeric(df['episodes'], errors='coerce')

# Average number of episodes by type


avg_episodes_by_type = df.groupby('type')['episodes'].mean()

# Bar plot: Average number of episodes by type


fig = plt.figure(figsize=(10, 6))
plt.bar(avg_episodes_by_type.index, avg_episodes_by_type)
plt.xlabel('Type of Anime')
plt.ylabel('Average No. of Episodes')
plt.title('Average No. of Episodes by Type of Anime')
plt.show()

# Pie chart: Type of anime distribution


type_counts = df['type'].value_counts()
plt.pie(type_counts, labels=type_counts.index, autopct='%1.1f%%')
plt.title('Distribution of Anime Types')
plt.show()

# Histogram: Distribution of anime names by type


names = df['name'].astype(str).tolist()
plt.hist(names, bins=20)
plt.xlabel('Anime Name')
plt.ylabel('Frequency')
plt.title('Distribution of Anime Names by Type')
plt.show()

You might also like