Welcome to Scribd!

Data Mining Major Exam

Uploaded by

0% found this document useful (0 votes)

17 views2 pages

This document outlines the major examination for the course Data Mining. It includes 8 questions covering topics like calculating similarity measures, decision tree learning, frequent pattern mining using the Apriori and FP-growth algorithms, clustering techniques like k-means and DBSCAN, principal component analysis, and detecting outliers in large datasets. Students are asked to perform calculations, explain concepts, apply algorithms to datasets, and discuss approaches to addressing challenges with big data.

Original Description:

Original Title

Major2020

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

0% found this document useful (0 votes)

17 views2 pages

Data Mining Major Exam

Uploaded by

Ajay Kumar

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Jump to Page

You are on page 1of 2

Search inside document

Department of Mathematics

II Semester 2019-2020 MTL 782 Data Mining

Major Examination Date: 23.8.2020 Time: 4 PM – 6 PM Weightage: 30%

Q1. Calculate the cosine, correlation, Jaccard, and Extended Jaccard similarity/distance for the
vectors x = (1, 1, 0, 1, 0, 1) and y = (1, 1, 1, 0, 0, 1). [2]

Q2. (a) How does an ordinal feature differ from a nominal feature? Explain briefly. [1]

(b) At what situation is semi-supervised learning most desirable? [1]

Q3. We will use the dataset below to learn a decision tree which predicts if people pass machine
learning (Yes or No), based on their previous GPA (High, Medium, or Low) and whether or not they
studied.

(a) What is the entropy H(Passed) ? [0.5]

(b) What is the entropy H(Passed | GPA)? [0.5]
(c) What is the entropy H(Passed | Studied)? [0.5]
(d) Draw the full decision tree that would be learned for this dataset. Show the calculations.[1.5]

Q4. You are given the transaction data shown in the Table below. There are 9 distinct transactions
(order:1 – order:9) and each transaction involves between 2 and 4 items.

There are a total of 5 items that are involved in the transactions. [2 + 2 + 2 + 2]

a. Apply the Apriori algorithm to the dataset of transactions and identify all frequent k-
itemsets. Show all of your work.
b. Find all strong association rules of the form: X ∧ Y à Z and note their confidence values.
c. Construct the FP-tree corresponding to the set of transactions. Show all steps involved.
d. Mine the FP-tree according to the FP-growth algorithm. Show all steps. The results should
include the set of frequent patterns generated through the different steps in the analysis.
Q-5) Assume that you have to explore a large data set of high dimensionality. You know nothing
about the distribution of the data. In text of no more than one page, discuss the following.

(a) How can k-means and DBSCAN be used to find the number of clusters in that data? [1]
(b) Explain how PCA can help find the dimensions where clusters separate. [1]
(c) Explain why PCA might neglect cluster separation in some dimensions. [1]
(d) Can k-means or DBSCAN be applied in a way that would help you find the dimensions in
which the clusters separate? [1]

Q6. (a) State one advantage and one disadvantage of DBSCAN clustering algorithm along with brief
explanation. [1]

(b) Suppose you are already given an optimal value for the ‘Minimum Points’ in DBSCAN clustering
algorithm. Explain how it can be used to find an optimal value for the distance ‘Eps’ used in the
DBSCAN clustering algorithm. [1]

(c ) Describe unsupervised cluster evaluation method which uses correlation and similarity matrix,
as discussed in the class. How that method can be used for validating the clusters obtained? What is
the intuition behind this evaluation measure? [1]
1 0.9
Q7. Given the correlation matrix 𝑅 = % * [1.5+1+1.5]
0.9 1
(a) Compute the eigenvalues λ1 and λ 2 of R and the corresponding eigenvectors γ1 and γ2 of R.
(b) Compute the weights of the principal components C1 and C2 that sets the scales of the
components and ensures that they are orthogonal.
(c) What proportion of the total variance in the data does the first principal component account
for?

Q8. (a) Discuss the differences between dimensionality reduction based on aggregation and dimensionality
reduction based on PCA. [1]

(b) Many statistical tests for outliers were developed in an environment in which a few
hundred observations was a large data set. We explore the limitations of such
approaches.
(1) For a set of 1,000,000 values, how likely are we to have outliers according to the
test that says a value is an outlier if it is more than three standard deviations from
the average? (Assume a normal distribution.) [1.5]
(2) Does the approach that states an outlier is an object of unusually low probability
need to be adjusted when dealing with large data sets? If so, how? [1.5]

AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
From Everand
AP Computer Science Principles: Student-Crafted Practice Tests For Excellence
Sama Alshatali
No ratings yet
12s MidI - SampleExam Print1
Document8 pages
12s MidI - SampleExam Print1
Divya Gn
No ratings yet
RecSys - Final (Solution)
Document6 pages
RecSys - Final (Solution)
K201873 Shayan hassan
No ratings yet
Ell784 Aq
Document2 pages
Ell784 Aq
lovlesh roy
No ratings yet
Ell784 17 Aq
Document8 pages
Ell784 17 Aq
lovlesh roy
No ratings yet
Solution Sketches April 26 Review For 2018 COSC 4355 Final Exam 1) Association Rule Mining
Document4 pages
Solution Sketches April 26 Review For 2018 COSC 4355 Final Exam 1) Association Rule Mining
zara shaheen
No ratings yet
Finalexam01summer PDF
Document2 pages
Finalexam01summer PDF
tor 3
No ratings yet
Review Questions DS
Document14 pages
Review Questions DS
Saleh Alizade
No ratings yet
Y1npi6v: (Other Students Have To Answer Any Three Questions From Section A)
Document3 pages
Y1npi6v: (Other Students Have To Answer Any Three Questions From Section A)
36rajnee kant
No ratings yet
Midterm - APS1070 - 2019 - 09 Fall
Document2 pages
Midterm - APS1070 - 2019 - 09 Fall
Michael Ye
No ratings yet
3001 Exam 2014 So Lns
Document7 pages
3001 Exam 2014 So Lns
EmilyLaw
No ratings yet
Exercises - Dss - Partd - Handout
Document12 pages
Exercises - Dss - Partd - Handout
Aditya Joshi
No ratings yet
E-Tivity 2.2 Tharcisse 217010849
Document7 pages
E-Tivity 2.2 Tharcisse 217010849
Tharcisse Tossen Tharry
No ratings yet
A Practical Guide To Support Vector Classi Cation - Chih-Wei Hsu, Chih-Chung Chang and Chih-Jen Lin
Document12 pages
A Practical Guide To Support Vector Classi Cation - Chih-Wei Hsu, Chih-Chung Chang and Chih-Jen Lin
Vítor Mangaravite
No ratings yet
Sample Ques
Document8 pages
Sample Ques
MD. MAHABUB RANA SAIKAT
No ratings yet
ML Assignment 3 Solutions
Document8 pages
ML Assignment 3 Solutions
Thrilling Meherab
No ratings yet
Cia1 Paper
Document2 pages
Cia1 Paper
vik
No ratings yet
2019 - Introduction To Data Analytics Using R
Document5 pages
2019 - Introduction To Data Analytics Using R
Yeickson Mendoza Martinez
No ratings yet
Machine Learning Model Predicts House Prices
Document9 pages
Machine Learning Model Predicts House Prices
Education VietCo
No ratings yet
Machine Learning Bits
Document28 pages
Machine Learning Bits
vyshnavi
100% (2)
Module 4QB
Document1 page
Module 4QB
Akshatha Shenoy
No ratings yet
Data Preprocessing: L1+ Freq
Document13 pages
Data Preprocessing: L1+ Freq
Anonymous LIQ5pC37
No ratings yet
It-3031 (DMDW) - CS End Nov 2023
Document23 pages
It-3031 (DMDW) - CS End Nov 2023
21051796
No ratings yet
Department of Computer Science and Engineering
Document3 pages
Department of Computer Science and Engineering
Md.Ashiqur Rahman
No ratings yet
BEC 341 2022 assign 3_231120_152534
Document4 pages
BEC 341 2022 assign 3_231120_152534
innocent mwansa
No ratings yet
2021 Fin Econ
Document6 pages
2021 Fin Econ
TAKUDZWA OSCAR NYAKUDYA
No ratings yet
Mca 3 Sem Artificial Intelligence Kca301 2022
Document2 pages
Mca 3 Sem Artificial Intelligence Kca301 2022
happyrobinsingh
No ratings yet
Sample Paper 1 (Solved) : Class Xii Informatics Practices (New)
Document4 pages
Sample Paper 1 (Solved) : Class Xii Informatics Practices (New)
Harsh Vardhan Singh
No ratings yet
ECS660U Main FINAL
Document7 pages
ECS660U Main FINAL
Muzamil Zafar
No ratings yet
Big Data Exercieses
Document6 pages
Big Data Exercieses
Menna Elfouly
No ratings yet
Solutions To II Unit Exercises From Kamber
Document16 pages
Solutions To II Unit Exercises From Kamber
jyothibellaryv
83% (42)
STSCI4060-HW5 Linear Regression and Clustering
Document9 pages
STSCI4060-HW5 Linear Regression and Clustering
Lei Huang
No ratings yet
2b.data Visualization
Document7 pages
2b.data Visualization
manasa reddy
No ratings yet
SLA - Class Test - 4 - AnswerKey
Document2 pages
SLA - Class Test - 4 - AnswerKey
cadi0761
No ratings yet
PCA - Key aspects of principal component analysis
Document19 pages
PCA - Key aspects of principal component analysis
HJ Consultants
No ratings yet
A Practical Guide To Support Vector Classification
Document16 pages
A Practical Guide To Support Vector Classification
Jônatas Oliveira Silva
No ratings yet
Machine Learning
Document7 pages
Machine Learning
Abinaya87
No ratings yet
Endsem 2018
Document2 pages
Endsem 2018
Puneet Sangal
No ratings yet
PCA Reduces 2D Dataset to 1D
Document15 pages
PCA Reduces 2D Dataset to 1D
Savitha Elluru
No ratings yet
End Sem 21 22 Spring
Document2 pages
End Sem 21 22 Spring
kshitibhusan bhoi
No ratings yet
University of Edinburgh College of Science and Engineering School of Informatics
Document5 pages
University of Edinburgh College of Science and Engineering School of Informatics
fusion
No ratings yet
A Practical Guide To Support Vector Classification: I I I N L
Document15 pages
A Practical Guide To Support Vector Classification: I I I N L
rabbityeah
No ratings yet
HT TP: //qpa Pe R.W But .Ac .In: Pattern Recognition
Document4 pages
HT TP: //qpa Pe R.W But .Ac .In: Pattern Recognition
Duma Dumai
No ratings yet
HRMS Applicant Assessment Form
Document12 pages
HRMS Applicant Assessment Form
Divya Kakumanu
No ratings yet
SOFT COMPUTING APPLICATION (INFO 4282) (4)
Document4 pages
SOFT COMPUTING APPLICATION (INFO 4282) (4)
rohitchand265
No ratings yet
Algorithm Its Clustering: Detecting
Document11 pages
Algorithm Its Clustering: Detecting
qqqqq
No ratings yet
Day04 Business Moments
Document10 pages
Day04 Business Moments
Divya
No ratings yet
Assignment DMBI 2
Document2 pages
Assignment DMBI 2
IMMORTAL'S PLAYZ
No ratings yet
CS 322 Assignment 2 UBC 2015
Document3 pages
CS 322 Assignment 2 UBC 2015
cauliflowerpower
No ratings yet
NLP Endsem 2016
Document2 pages
NLP Endsem 2016
Puneet Sangal
No ratings yet
DWM Solution May 2019
Document9 pages
DWM Solution May 2019
new acc jeet
No ratings yet
2326_EC2020_Main EQP v1_Final
Document19 pages
2326_EC2020_Main EQP v1_Final
Aryan Mittal
No ratings yet
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
Document4 pages
Assignment 4 (Sol.) : Introduction To Machine Learning Prof. B. Ravindran
Byron Xavier Lima Cedillo
No ratings yet
8661 - Tests For AS A Level Year 1 Edexcel Stats Mechanics Challenge Set A v1.2
Document9 pages
8661 - Tests For AS A Level Year 1 Edexcel Stats Mechanics Challenge Set A v1.2
Cherie Chow
No ratings yet
Midsem I 31 03 2023
Document12 pages
Midsem I 31 03 2023
MUSHTAQ AHAMED
No ratings yet
Data Mining Worksheet One
Document2 pages
Data Mining Worksheet One
Abrham Danail
No ratings yet
640005
Document4 pages
640005
Swetang Khatri
No ratings yet
DS 432 Assignment I 2020
Document7 pages
DS 432 Assignment I 2020
sristi agrawal
No ratings yet
Oot
Document19 pages
Oot
Priyanka R Shah
No ratings yet
Project C: Dr. Shahin Tavakoli Applied Bayesian Statistics Project 1
Document2 pages
Project C: Dr. Shahin Tavakoli Applied Bayesian Statistics Project 1
方鑫然
No ratings yet
Test 12020
Document2 pages
Test 12020
Ajay Kumar
No ratings yet
MTL782 Major 2018-19 Sem2
Document2 pages
MTL782 Major 2018-19 Sem2
Ajay Kumar
No ratings yet
Verilog Assigment 1
Document12 pages
Verilog Assigment 1
Ajay Kumar
No ratings yet
Why Indian state decides marriage laws
Document2 pages
Why Indian state decides marriage laws
Ajay Kumar
No ratings yet
Service Manual KTM Adventure 1190
Document226 pages
Service Manual KTM Adventure 1190
Tim Blockx
No ratings yet
What is Strategic Human Resource Management
Document8 pages
What is Strategic Human Resource Management
Yashasvi Parsai
No ratings yet
Jewett CrisisManagementPlan 2021
Document30 pages
Jewett CrisisManagementPlan 2021
Musadiq Hussain
No ratings yet
Proceedings of 2006 WSEAS Conference on Heat and Mass Transfer
Document7 pages
Proceedings of 2006 WSEAS Conference on Heat and Mass Transfer
Anonymous knICax
No ratings yet
Codex Alexandrinus
Document294 pages
Codex Alexandrinus
Hevel Cava
100% (2)
Antioxidant Activity by DPPH Radical Scavenging Method of Ageratum Conyzoides Linn. Leaves
Document7 pages
Antioxidant Activity by DPPH Radical Scavenging Method of Ageratum Conyzoides Linn. Leaves
pasid harlisa
No ratings yet
37LG5500
Document33 pages
37LG5500
Toni011973
No ratings yet
NVIDIA Announces Financial Results For Fourth Quarter and Fiscal 2023
Document10 pages
NVIDIA Announces Financial Results For Fourth Quarter and Fiscal 2023
Andrei Seiman
No ratings yet
ADU4518R7v06: Antenna Specifications
Document1 page
ADU4518R7v06: Antenna Specifications
Andrew
No ratings yet
bài tập ôn MA1
Document34 pages
bài tập ôn MA1
Thái Dương
No ratings yet
Dryspell+ Manual
Document71 pages
Dryspell+ Manual
Aldo D'Andrea
No ratings yet
Steel Cargoes Guidance
Document64 pages
Steel Cargoes Guidance
Aamir Sirohi
No ratings yet
Audprob 9
Document2 pages
Audprob 9
lovely abinal
No ratings yet
Chord Method
Document17 pages
Chord Method
Jedielson Girardi
No ratings yet
A Day With Maria Becerra (1) - 2
Document2 pages
A Day With Maria Becerra (1) - 2
Felipe
No ratings yet
Botulinum Toxin in Aesthetic Medicine Myths and Realities
Document12 pages
Botulinum Toxin in Aesthetic Medicine Myths and Realities
Щербакова Лена
No ratings yet
Learner's Module in Grade 7 Mathematics Pages 1 - 4 Global Mathematics, Page 2 - 18 Synergy For Success in Mathematics, Pages 2 - 13
Document12 pages
Learner's Module in Grade 7 Mathematics Pages 1 - 4 Global Mathematics, Page 2 - 18 Synergy For Success in Mathematics, Pages 2 - 13
Maricel Tarenio Macalino
No ratings yet
Understanding Cash and Cash Equivalents
Document6 pages
Understanding Cash and Cash Equivalents
Nicole Anne Santiago Sibulo
No ratings yet
Biblical Productivity CJ Mahaney
Document36 pages
Biblical Productivity CJ Mahaney
Anthony Alvarado
No ratings yet
Aeroplane 04.2020
Document116 pages
Aeroplane 04.2020
Maxi Ruiz
No ratings yet
Campus Event Reflection
Document2 pages
Campus Event Reflection
dntbenford
No ratings yet
IPv4 - IPv4 Header - IPv4 Header Format - Gate Vidyalay
Document15 pages
IPv4 - IPv4 Header - IPv4 Header Format - Gate Vidyalay
Sakshi Tapase
No ratings yet
Materi Matrikulasi
Document72 pages
Materi Matrikulasi
Ayziffy
No ratings yet
Geospatial Assessment of Climatic Variability and Aridity in Katsina State, Nigeria
Document11 pages
Geospatial Assessment of Climatic Variability and Aridity in Katsina State, Nigeria
International Journal of Innovative Science and Research Technology
No ratings yet
Finite Element and Analytical Modelling of PVC-confined Concrete Columns Under Axial Compression
Document23 pages
Finite Element and Analytical Modelling of PVC-confined Concrete Columns Under Axial Compression
Shaker Qaidi
No ratings yet
Product Information: Automotive Sensor UMRR-96 TYPE 153
Document18 pages
Product Information: Automotive Sensor UMRR-96 TYPE 153
CORAL ALONSO
No ratings yet
Shading Devices
Document4 pages
Shading Devices
Ayush Tyagi
No ratings yet
Safety Budget Planner
Document12 pages
Safety Budget Planner
sidhant nayak
No ratings yet
CH 6 Sandwiches
Document10 pages
CH 6 Sandwiches
Krishna Chaudhary
No ratings yet
DET-2 Service Manual
Document105 pages
DET-2 Service Manual
kriotron
50% (2)