0% found this document useful (0 votes)

48 views4 pages

Ram Exp 3

Uploaded by

complab463

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Topics covered

Data Analysis Tools,
Feature Extraction,
Data Analysis Framework,
Data Analysis Techniques,
Data Trends,
Data Analysis Challenges,
Machine Learning,
Data Processing,
Data Science,
Data Analysis Methods

0% found this document useful (0 votes)

48 views4 pages

Ram Exp 3

Uploaded by

complab463

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Topics covered

Data Analysis Tools,
Feature Extraction,
Data Analysis Framework,
Data Analysis Techniques,
Data Trends,
Data Analysis Challenges,
Machine Learning,
Data Processing,
Data Science,
Data Analysis Methods

Name: Ram Ganesh P

Reg. No: 913121205075

Ex. No: 03 FEATURE EXTRACTION WITH CORRELATION (BI-VARIATE)

Date: ANALYSIS AND CATEGORIZATION USING PYTHON/R

Aim:
To Perform Feature Extraction with correlation (Bivariate) Analysis and categorisation using
Python/R.

Steps:
1. DATA PREPARATION:
• Import necessary libraries (pandas', 'seaborn', and 'matplotlib').
• Load your data from a CSV file into a Pandas DataFrame.
2. CORRELATION MATRIX:
• Create a correlation matrix for specific columns related to air quality indices
3. VISUALIZE THE CORRELATION MATRIX:
• Create a heatmap of the correlation matrix, enhancing visual understanding.
• Customize the heatmap with annotations, color palette, and grid lines.
4. DISPLAY THE CORRELATION HEATMAP:
• Show the heatmap with correlations between the air quality indices, helping identify
relationships.
5. FEATURE SELECTION BASED ON CORRELATION:
• Calculate correlations between all columns and the "PM2.5 AQI Value."
• Sort and select features with correlations greater than 0.25 in absolute value.
Print and display the selected features, helping identify which variables correlate significantly with the
target variable.

Dataset: indian_rda_based_diet_recommendation_system

Python Code:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load data from a CSV file

data = pd.read_csv('/content/Indian_food.csv')

# Calculate the correlation matrix

correlation_matrix = data.corr()

# Plot the heatmap

21PCS02 – Exploratory Data Analysis Laboratory
Name: Ram Ganesh P
Reg. No: 913121205075
plt.figure(figsize=(10, 8))
heatmap = sns.heatmap(correlation_matrix,
annot=True,cmap='coolwarm',fmt=".2f",
linewidths=0.5,annot_kws={"size": 10,cbar=True,square=True,mask=None,vmin=-1,
vmax=1,center=0,cbar_kws={"shrink": 0.8}, robust=False,linecolor='black')
plt.title('Correlation Matrix Heatmap', fontsize=16)
plt.xticks(rotation=45)
plt.yticks(rotation=45)
plt.tight_layout()
plt.show()

# Select features with correlations greater than 0.25 in absolute value

high_correlation_features = correlation_matrix[abs(correlation_matrix) > 0.25]

# Drop NA values to keep only valid correlations

high_correlation_features = high_correlation_features.dropna(axis=1, how='all').dropna(axis=0,
how='all')

# Print the selected features

print("Features with correlations greater than 0.25 in absolute value:")
print(high_correlation_features)

# Plot the heatmap for selected features

plt.figure(figsize=(10, 8))
heatmap = sns.heatmap(high_correlation_features,
annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5,
annot_kws={"size": 10}, cbar=True, square=True, mask=None,
vmin=-1, vmax=1, center=0, cbar_kws={"shrink": 0.8},
robust=False, linecolor='black')
plt.title('High Correlation Features Heatmap', fontsize=16)
plt.xticks(rotation=45)
plt.yticks(rotation=45)
plt.tight_layout()
plt.show()

21PCS02 – Exploratory Data Analysis Laboratory

Name: Ram Ganesh P
Reg. No: 913121205075
Output:

21PCS02 – Exploratory Data Analysis Laboratory

Name: Ram Ganesh P
Reg. No: 913121205075

Result:
In this experiment statistical analysis with exploratory graphs for the given data using Python /
R was implemented and the output was verified successfully.

21PCS02 – Exploratory Data Analysis Laboratory

21PIT01 - CT Question Bank
No ratings yet
21PIT01 - CT Question Bank
3 pages
AAD Assignment II
No ratings yet
AAD Assignment II
12 pages
Unit 2
No ratings yet
Unit 2
27 pages
Unit2 QB
No ratings yet
Unit2 QB
5 pages
Waste Segregation Using IoT Devices and Machine
No ratings yet
Waste Segregation Using IoT Devices and Machine
7 pages

Ram Exp 3

Uploaded by

Ram Exp 3

Uploaded by

Name: Ram Ganesh P

Reg. No: 913121205075

Ex. No: 03 FEATURE EXTRACTION WITH CORRELATION (BI-VARIATE)

# Load data from a CSV file

# Calculate the correlation matrix

# Plot the heatmap

# Select features with correlations greater than 0.25 in absolute value

# Drop NA values to keep only valid correlations

# Print the selected features

# Plot the heatmap for selected features

21PCS02 – Exploratory Data Analysis Laboratory

21PCS02 – Exploratory Data Analysis Laboratory

21PCS02 – Exploratory Data Analysis Laboratory

You might also like