Name: Ram Ganesh P
Reg. No: 913121205075
Ex. No: 03 FEATURE EXTRACTION WITH CORRELATION (BI-VARIATE)
Date: ANALYSIS AND CATEGORIZATION USING PYTHON/R
Aim:
To Perform Feature Extraction with correlation (Bivariate) Analysis and categorisation using
Python/R.
Steps:
1. DATA PREPARATION:
• Import necessary libraries (pandas', 'seaborn', and 'matplotlib').
• Load your data from a CSV file into a Pandas DataFrame.
2. CORRELATION MATRIX:
• Create a correlation matrix for specific columns related to air quality indices
3. VISUALIZE THE CORRELATION MATRIX:
• Create a heatmap of the correlation matrix, enhancing visual understanding.
• Customize the heatmap with annotations, color palette, and grid lines.
4. DISPLAY THE CORRELATION HEATMAP:
• Show the heatmap with correlations between the air quality indices, helping identify
relationships.
5. FEATURE SELECTION BASED ON CORRELATION:
• Calculate correlations between all columns and the "PM2.5 AQI Value."
• Sort and select features with correlations greater than 0.25 in absolute value.
Print and display the selected features, helping identify which variables correlate significantly with the
target variable.
Dataset: indian_rda_based_diet_recommendation_system
Python Code:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
# Load data from a CSV file
data = pd.read_csv('/content/Indian_food.csv')
# Calculate the correlation matrix
correlation_matrix = data.corr()
# Plot the heatmap
21PCS02 – Exploratory Data Analysis Laboratory
Name: Ram Ganesh P
Reg. No: 913121205075
plt.figure(figsize=(10, 8))
heatmap = sns.heatmap(correlation_matrix,
annot=True,cmap='coolwarm',fmt=".2f",
linewidths=0.5,annot_kws={"size": 10,cbar=True,square=True,mask=None,vmin=-1,
vmax=1,center=0,cbar_kws={"shrink": 0.8}, robust=False,linecolor='black')
plt.title('Correlation Matrix Heatmap', fontsize=16)
plt.xticks(rotation=45)
plt.yticks(rotation=45)
plt.tight_layout()
plt.show()
# Select features with correlations greater than 0.25 in absolute value
high_correlation_features = correlation_matrix[abs(correlation_matrix) > 0.25]
# Drop NA values to keep only valid correlations
high_correlation_features = high_correlation_features.dropna(axis=1, how='all').dropna(axis=0,
how='all')
# Print the selected features
print("Features with correlations greater than 0.25 in absolute value:")
print(high_correlation_features)
# Plot the heatmap for selected features
plt.figure(figsize=(10, 8))
heatmap = sns.heatmap(high_correlation_features,
annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5,
annot_kws={"size": 10}, cbar=True, square=True, mask=None,
vmin=-1, vmax=1, center=0, cbar_kws={"shrink": 0.8},
robust=False, linecolor='black')
plt.title('High Correlation Features Heatmap', fontsize=16)
plt.xticks(rotation=45)
plt.yticks(rotation=45)
plt.tight_layout()
plt.show()
21PCS02 – Exploratory Data Analysis Laboratory
Name: Ram Ganesh P
Reg. No: 913121205075
Output:
21PCS02 – Exploratory Data Analysis Laboratory
Name: Ram Ganesh P
Reg. No: 913121205075
Result:
In this experiment statistical analysis with exploratory graphs for the given data using Python /
R was implemented and the output was verified successfully.
21PCS02 – Exploratory Data Analysis Laboratory