0% found this document useful (0 votes)
48 views4 pages

Ram Exp 3

Uploaded by

complab463
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Topics covered

  • Data Analysis Tools,
  • Feature Extraction,
  • Data Analysis Framework,
  • Data Analysis Techniques,
  • Data Trends,
  • Data Analysis Challenges,
  • Machine Learning,
  • Data Processing,
  • Data Science,
  • Data Analysis Methods
0% found this document useful (0 votes)
48 views4 pages

Ram Exp 3

Uploaded by

complab463
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Topics covered

  • Data Analysis Tools,
  • Feature Extraction,
  • Data Analysis Framework,
  • Data Analysis Techniques,
  • Data Trends,
  • Data Analysis Challenges,
  • Machine Learning,
  • Data Processing,
  • Data Science,
  • Data Analysis Methods

Name: Ram Ganesh P

Reg. No: 913121205075

Ex. No: 03 FEATURE EXTRACTION WITH CORRELATION (BI-VARIATE)


Date: ANALYSIS AND CATEGORIZATION USING PYTHON/R

Aim:
To Perform Feature Extraction with correlation (Bivariate) Analysis and categorisation using
Python/R.

Steps:
1. DATA PREPARATION:
• Import necessary libraries (pandas', 'seaborn', and 'matplotlib').
• Load your data from a CSV file into a Pandas DataFrame.
2. CORRELATION MATRIX:
• Create a correlation matrix for specific columns related to air quality indices
3. VISUALIZE THE CORRELATION MATRIX:
• Create a heatmap of the correlation matrix, enhancing visual understanding.
• Customize the heatmap with annotations, color palette, and grid lines.
4. DISPLAY THE CORRELATION HEATMAP:
• Show the heatmap with correlations between the air quality indices, helping identify
relationships.
5. FEATURE SELECTION BASED ON CORRELATION:
• Calculate correlations between all columns and the "PM2.5 AQI Value."
• Sort and select features with correlations greater than 0.25 in absolute value.
Print and display the selected features, helping identify which variables correlate significantly with the
target variable.

Dataset: indian_rda_based_diet_recommendation_system

Python Code:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load data from a CSV file


data = pd.read_csv('/content/Indian_food.csv')

# Calculate the correlation matrix


correlation_matrix = data.corr()

# Plot the heatmap


21PCS02 – Exploratory Data Analysis Laboratory
Name: Ram Ganesh P
Reg. No: 913121205075
plt.figure(figsize=(10, 8))
heatmap = sns.heatmap(correlation_matrix,
annot=True,cmap='coolwarm',fmt=".2f",
linewidths=0.5,annot_kws={"size": 10,cbar=True,square=True,mask=None,vmin=-1,
vmax=1,center=0,cbar_kws={"shrink": 0.8}, robust=False,linecolor='black')
plt.title('Correlation Matrix Heatmap', fontsize=16)
plt.xticks(rotation=45)
plt.yticks(rotation=45)
plt.tight_layout()
plt.show()

# Select features with correlations greater than 0.25 in absolute value


high_correlation_features = correlation_matrix[abs(correlation_matrix) > 0.25]

# Drop NA values to keep only valid correlations


high_correlation_features = high_correlation_features.dropna(axis=1, how='all').dropna(axis=0,
how='all')

# Print the selected features


print("Features with correlations greater than 0.25 in absolute value:")
print(high_correlation_features)

# Plot the heatmap for selected features


plt.figure(figsize=(10, 8))
heatmap = sns.heatmap(high_correlation_features,
annot=True, cmap='coolwarm', fmt=".2f", linewidths=0.5,
annot_kws={"size": 10}, cbar=True, square=True, mask=None,
vmin=-1, vmax=1, center=0, cbar_kws={"shrink": 0.8},
robust=False, linecolor='black')
plt.title('High Correlation Features Heatmap', fontsize=16)
plt.xticks(rotation=45)
plt.yticks(rotation=45)
plt.tight_layout()
plt.show()

21PCS02 – Exploratory Data Analysis Laboratory


Name: Ram Ganesh P
Reg. No: 913121205075
Output:

21PCS02 – Exploratory Data Analysis Laboratory


Name: Ram Ganesh P
Reg. No: 913121205075

Result:
In this experiment statistical analysis with exploratory graphs for the given data using Python /
R was implemented and the output was verified successfully.

21PCS02 – Exploratory Data Analysis Laboratory

You might also like