This document provides instructions for Lab 3 of a course on Data Mining and Predictive Modelling. It includes 3 experiments on feature selection using various techniques:
1. The first experiment uses categorical data to create dummy variables and reduce features using dictionary vectorization and PCA.
2. The second experiment reduces the features of a balance dataset to 2 dimensions using PCA, 1 dimension using LDA, and 2 dimensions using LLE.
3. The third experiment joins employee and transaction data on employee ID, identifies any duplicate rows, and finds partial duplicates based on employee ID and age.
This document provides instructions for Lab 3 of a course on Data Mining and Predictive Modelling. It includes 3 experiments on feature selection using various techniques:
1. The first experiment uses categorical data to create dummy variables and reduce features using dictionary vectorization and PCA.
2. The second experiment reduces the features of a balance dataset to 2 dimensions using PCA, 1 dimension using LDA, and 2 dimensions using LLE.
3. The third experiment joins employee and transaction data on employee ID, identifies any duplicate rows, and finds partial duplicates based on employee ID and age.
This document provides instructions for Lab 3 of a course on Data Mining and Predictive Modelling. It includes 3 experiments on feature selection using various techniques:
1. The first experiment uses categorical data to create dummy variables and reduce features using dictionary vectorization and PCA.
2. The second experiment reduces the features of a balance dataset to 2 dimensions using PCA, 1 dimension using LDA, and 2 dimensions using LLE.
3. The third experiment joins employee and transaction data on employee ID, identifies any duplicate rows, and finds partial duplicates based on employee ID and age.
School of Computer Science Engineering and Technology
Course- BTech Type- Specialization Core II
Course Code- CSET228 Course Name- Data Mining and Predictive Modelling
Year- 2022-23 Semester- Even
Date- Batch- IV Semester (All)
Lab 3
CO Mapping Exp No Name CO1 CO2 CO3 1 Feature Selection
a) Use the Categorial file to do the following operations:
i. Find the categorial information of the object and drop the emp_Id. Find dummies variable from categorial features. ii. Use the DictVectorizer function and display the features name from the given Categorial file. Find the vector form V of the output. b) Reduce the features using PCA, dataset is given with name Balance: i. Import the PCA from sklearn and reduce the features in 2 dimensions. Plot the result using scatterplot function. ii. Import LDA and reduce the dimensions to 1 using LDA and visualize it. iii. Import LLE and reduce the dimensions to 2 using it and visualize it.
c) Use trans and emp data for the following operations:
i. Join them using Merge function on empId and show the result. ii. If there are some rows are duplicate, show the result and if partial duplicate then find them on (empId, age)