You are on page 1of 6

Name: Vidya Janani V

Register Number: 913121205090

Ex. No: 03 Feature Extraction with Correlation (Bivariate) Analysis and

Date: 04.03.2024 Categorization Using Python/R

Aim:

To Perform Feature Extraction with correlation (Bivariate) Analysis and categorisation using
Python/R.

Steps:

1. Data Preparation:

• Import necessary libraries (pandas', 'seaborn', and 'matplotlib').


• Load your data from a CSV file into a Pandas DataFrame.
2. Correlation Matrix:

• Create a correlation matrix for specific columns related to air quality indices
3. Visualize the correlation matrix:

• Create a heatmap of the correlation matrix, enhancing visual understanding.


• Customize the heatmap with annotations, color palette, and grid lines.
4. Display the correlation heatmap:

• Show the heatmap with correlations between the air quality indices, helping identify
relationships.
5. Feature selection based on the correlation matrix:

• Calculate correlations between all columns and the "PM2.5 AQI Value."
• Sort and select features with correlations greater than 0.25 in absolute value.
• Print and display the selected features, helping identify which variables correlate significantly
with the target variable.
Importing the dataset and formation of the Co-relation Matrix
Python Code:
import pandas as pd

# Load the dataset into a DataFrame


# Replace 'your_dataset.csv' with the actual path to your dataset
job_placement_df = pd.read_csv('job_placement.csv')

# Display the first few rows of the DataFrame to understand its structure
print(job_placement_df.head())

# Compute the correlation matrix


corr_matrix = job_placement_df.corr()

21PCS02 – Exploratory Data Analysis Laboratory Dept of IT


Name: Vidya Janani V
Register Number: 913121205090

# Display the correlation matrix


print("Correlation Matrix:")
print(corr_matrix)
Output:

Visualizing the correlation matrix

Python Code

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the dataset into a DataFrame


# Replace 'your_dataset.csv' with the actual path to your dataset
job_placement_df = pd.read_csv('job_placement.csv')

# Compute the correlation matrix


corr_matrix = job_placement_df.corr()

# Plotting the correlation matrix using seaborn heatmap

21PCS02 – Exploratory Data Analysis Laboratory Dept of IT


Name: Vidya Janani V
Register Number: 913121205090

plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=.5)
plt.title('Correlation Matrix of Job Placement Dataset')
plt.show()

Output

Displaying the correlation heatmap:

Python Code

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load the dataset into a DataFrame


# Replace 'your_dataset.csv' with the actual path to your dataset
job_placement_df = pd.read_csv('job_placement.csv')

# Compute the correlation matrix


corr_matrix = job_placement_df.corr()

# Plotting the correlation heatmap


21PCS02 – Exploratory Data Analysis Laboratory Dept of IT
Name: Vidya Janani V
Register Number: 913121205090
plt.figure(figsize=(10, 8))
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', fmt=".2f", linewidths=.5)
plt.title('Correlation Heatmap of Job Placement Dataset')
plt.show()

Output

Feature selection based on the correlation matrix:

Python Code

import pandas as pd

# Load the dataset into a DataFrame


# Replace 'your_dataset.csv' with the actual path to your dataset
job_placement_df = pd.read_csv('job_placement.csv')

# Compute the correlation matrix


corr_matrix = job_placement_df.corr()

# Display the correlation matrix


print("Correlation Matrix:")
print(corr_matrix)

# Setting a threshold for correlation


21PCS02 – Exploratory Data Analysis Laboratory Dept of IT
Name: Vidya Janani V
Register Number: 913121205090
# You can adjust this threshold based on your requirements
threshold = 0.5

# Selecting features highly correlated with each other


# Here, we remove one of the features from each pair of highly correlated features
correlated_features = set()
for i in range(len(corr_matrix.columns)):
for j in range(i):
if abs(corr_matrix.iloc[i, j]) > threshold:
colname = corr_matrix.columns[i]
correlated_features.add(colname)

print("Correlated Features:")
print(correlated_features)

Output

21PCS02 – Exploratory Data Analysis Laboratory Dept of IT


Name: Vidya Janani V
Register Number: 913121205090

21PCS02–Exploratory Data Analysis Marks


Laboratory
Observation ( 20 )

Record ( 5 )

Total ( 25 )

Result:
In this Experiment , Feature Extraction with correlation (Bivariate) Analysis and
categorization using Python/R was implemented and the output is verified successfully.

21PCS02 – Exploratory Data Analysis Laboratory Dept of IT

You might also like