You are on page 1of 8

MANIPAL UNIVERSITY JAIPUR

School of Computing and Information Technology


Department of Information Technology
Course Hand-out
Data Mining and Warehousing | IT 3240 | 3 Credits | 3 0 0 3
Session: Jan - May 2023 | Faculty: Dr. Shalini Puri (Sec F), Dr. Sudhir Sharma (Sec A), Mr. Deevesh Choudhary
(Sec B), Dr. Sumit Dhariwal (Sec C), Dr. Aprna Tripathi (Sec D), Dr. Shweta Sharma (Sec E)

A. INTRODUCTION: This course discusses concepts and terminology associated with Statistics,
Database Systems, and machine learning. The course also discusses the pseudo code and data
structures used in the multidimensional arrays for data mining tasks.

B. COURSE OUTCOMES: At the end of the course, students will be able to


[3240.1]. Interpret the contribution of data warehousing and data mining to the decision-
support level of organizations.
[3240.2]. Categorize and carefully differentiate between situations for applying different data-
mining techniques: frequent pattern mining, association, correlation, classification,
prediction, and cluster and outlier analysis.
[3240.3]. Design and implement systems for data mining.
[3240.4]. Evaluate the performance of different data mining algorithms.
[3240.5]. Propose data-mining solutions for different applications.

C. PROGRAM OUTCOMES AND PROGRAM-SPECIFIC OUTCOMES

[PO.1]. Engineering knowledge: Apply the knowledge of mathematics, science, engineering


fundamentals, and an engineering specialization to the solution of complex engineering
problems.

[PO.2]. Problem analysis: Identify, formulate, research literature, and analyse complex engineering
problems reaching substantiated conclusions using the first principles of mathematics,
natural sciences, and engineering sciences.

[PO.3]. Design/development of solutions: Design solutions for complex engineering problems and
design system components or processes that meet the specified needs with appropriate
consideration for the public health and safety, and the cultural, societal, and environmental
considerations.

[PO.4]. Conduct investigations of complex problems: Use research-based knowledge and research
methods including design of experiments, analysis and interpretation of data, and synthesis
of the information to provide valid conclusions.

[PO.5]. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modelling to complex engineering
activities with an understanding of the limitations.
1
[PO.6]. The engineer and society: Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal, and cultural issues, and the consequent responsibilities
relevant to the professional engineering practice.

[PO.7]. Environment and sustainability: Understand the impact of the professional engineering
solutions in societal and environmental contexts, and demonstrate the knowledge of, and
need for sustainable development.

[PO.8]. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and
norms of the engineering practices

[PO.9]. Individual and teamwork: Function effectively as an individual, and as a member or leader in
diverse teams, and in multidisciplinary settings

[PO.10]. Communication: Communicate effectively on complex engineering activities with the


engineering community and with society at large, such as, being able to comprehend and
write effective reports and design documentation, make effective presentations, and give
and receive clear instructions

[PO.11]. Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member
and leader in a team, to manage projects and in multidisciplinary environments.

[PO.12]. Life-long learning: Recognize the need for and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.

D. PROGRAM SPECIFIC OBJECTIVES (PSOs)


[PSO.1]. To apply creativity in support of the design, simulation, implementation, and inference of
existing and advanced technologies.
[PSO.2]. To participate & succeed in IT-oriented jobs/competitive examinations that offer
inspiring & gratifying careers.
[PSO.3]. To recognize the importance of professional developments by pursuing postgraduate
studies and positions.

E. ASSESSMENT PLAN:

Criteria Description Maximum


Marks
Mid Term-I 20
Mid Term-II 20
Internal Assessment
Quiz – 3 (Best 2), Assignments - 1 (Accumulated and 10 + 10
(Summative)
Averaged)
End Term Exam 40
End Term Exam (Closed book)
(Summative)
Total 100
Attendance A minimum of 75% Attendance is required to be maintained by a
2
(Formative) student to be qualified for taking up the End Semester examination.
The allowance of 25% includes all types of leaves including medical
leaves.
Make up Assignments As per department’s directives
(Formative)
Homework/ Home As per course instructor’s wish
Assignment/ Activity
Assignment
(Formative)

F. SYLLABUS:
Data warehousing Components: Introduction to Data Warehouse, Statistical Observation on
Data, Data Types, DBMS Schemas for Decision Support, Data Mart, Data Extraction,
Transformation and Load (ETL) Operations, Metadata; Online Analytical Processing (OLAP),
Online Transaction Processing (OLTP), ROLAP, MOLAP, HOLAP and their Operations, Bitmap
Indexing, Join Indexing, Attribute Selection Measure, Method, Data Cubing, Slicing , Dicing

Data Mining: Introduction Data Mining & Applications, Types of Data, Pre-Processing,
Association Rule Mining (ARM): Mining Frequent Patterns, K-Frequent Item Set Mining, A-
Priori Algorithm, Associations and Correlations Mining, Correlation Analysis, Constraint Based
Association Mining; Classification and Prediction : Basic Concepts , Entropy, Decision Tree,
Naïve Bayes Algorithm, Neural Networks, Back Propagation, Support Vector Machines,
Associative Classification, Lazy Learners, Clustering: Basic Concepts, Cluster Analysis, K-
Means, Partitioning Methods, Hierarchical Clustering, Expectation Maximization, Density based
Clustering, Web Mining, Text Mining, Spatial Mining, Case Study: Case Studies on Various
Data Mining Techniques with Varying Data Sets.

G. TEXT BOOKS:
1. A. Berson and S. J. Smith, “Data Warehousing, Data Mining & OLAP”, Tata McGraw – Hill
Edition, Tenth Reprint 2007.
2. J.Han and M. Kambher, “Data Mining Concepts and Techniques”, Second Edition, Elsevier,
2007.
H. REFERENCE BOOKS:
1. P. N. Tan, M. Steinbach and V. Kumar, “Introduction to Data Mining”, Person Education,
2007.
2. K.P. Soman, S. Diwakar and V. Ajay, “Insight into Data mining Theory and Practice”,
Easter Economy Edition, Prentice Hall of India, 2006.

3
I. OUTLINE OF LECTURE PLAN

Mode Corresponding
Lec
TOPICS SESSION OUTCOMES Mode Of Assessing CO
No
of Delivery CO

Introduction and Course Hand- To acquaint and clear teachers’ expectations and
1. Lecture IT3240.1 Mid-Term I, Quiz-1 & End Term
out briefing understand student expectations.

Data Warehouse Statistical Describe the basic concepts of data warehousing, Lecture, Practice
2. IT3240.1 Mid-Term I, Quiz-1 & End Term
Description of Data OLAP and its types, indexing, and selection measures. Questions

Describe the basic concepts of data warehousing,


3. Data Pre-processing Lecture IT3240.1 Mid-Term I, Quiz-1 & End Term
OLAP and its types, indexing, and selection measures.

Describe the basic concepts of data warehousing,


4. Data Mart, Data Extraction Lecture IT3240.1 Mid-Term I, Quiz-1 & End Term
OLAP and its types, indexing, and selection measures

Transformation and Load (ETL) Describe the basic concepts of data warehousing, Mid-Term I, Quiz-1 & End Term
5 Flipped Class IT3240.1
Operations, OLAP and its types, indexing, and selection measures

Metadata Data Cube –Star Describe the basic concepts of data warehousing, Mid-Term I, Quiz-1 & End Term
6 Lecture IT3240.1
Schema, Snowflake Schema OLAP and its types, indexing, and selection measures

Dimensions, Measures & OLAP Describe the basic concepts of data warehousing, Lecture, Practice Mid-Term I, Quiz-1 & End Term
7 IT3240.1
server Architecture OLAP and its types, indexing, and selection measures Questions

ROLAP, MOLAP, HOLAP and Describe the basic concepts of data warehousing, Lecture, Practice Mid-Term I, Quiz-1 & End Term
8 IT3240.1
their Operations OLAP and its types, indexing, and selection measures Questions

9 Data Cube, Bitmap Indexing Describe the basic concepts of data warehousing, Lecture IT3240.1 Mid Term I, Quiz-2 & End Term

4
OLAP and its types, indexing and selection measures.

Join Indexing, Attribute Describe the basic concepts of data warehousing, Mid Term I, Quiz-2 & End Term
10 Lecture IT3240.1
Selection Measure OLAP and its types, indexing and selection measures.

Describe the basic concepts of data warehousing, Lecture, Practice Mid Term I, Quiz-2 & End Term
11 Star Tree Construction IT3240.1
OLAP and its types, indexing and selection measures. questions

Describe the data mining and its applications and Mid Term I, Quiz-2 & End Term
Introduction Data Mining &
12 understand the correlation analysis and association Lecture IT3240.2
Applications, Types of Data
mining along with their algorithms.

Describe the data mining and its applications and Mid Term I, Quiz-2 & End Term
13 Pre-Processing understand the correlation analysis and association Lecture IT3240.2
mining along with their algorithms.

Describe the data mining and its applications and Mid Term I, Quiz-2 & End Term
Frequent Patterns, K-Frequent Lecture, Practice
14 understand the correlation analysis and association IT3240.2
Item Set Mining questions
mining along with their algorithms.

Describe the data mining and its applications and Mid Term I, Quiz-2 & End Term
Lecture, Practice
15 Apriori Algorithm understand the correlation analysis and association IT3240.2
questions
mining along with their algorithms.

Describe the data mining and its applications and


Lecture, Practice Mid-Term II, Quiz-2 & End
16 Generating Association Rules understand the correlation analysis and association IT3240.2
questions Term
mining along with their algorithms.

Describe the data mining and its applications and Mid-Term II, Quiz-2 & End
Associations and Correlations
17 understand the correlation analysis and association Flipped Class IT3240.2 Term
Mining
mining along with their algorithms.

5
Describe the data mining and its applications and Mid-Term II, Quiz-2 & End
Lecture, Practice
18 Correlation Analysis, understand the correlation analysis and association IT3240.2 Term
Questions
mining along with their algorithms.

Describe the data mining and its applications and Mid-Term II, Quiz-2 & End
Constraint-Based Association
19 understand the correlation analysis and association Lecture IT3240.2 Term
Mining
mining along with their algorithms.

Explain the classification and prediction with their Lecture, Practice Mid-Term II, Quiz-2 & End
20 Basic Concepts, Entropy IT3240.3
algorithms. Questions Term

Explain the classification and prediction with their Lecture, Practice Mid-Term II, Quiz-2 & End
21 Decision Tree Induction IT3240.3
algorithms. Questions Term

Bayes Classification Methods: Explain the classification and prediction with their Mid-Term II, Quiz-2 & End
22 Lecture IT3240.3
Bayes Theorem algorithms. Term

Techniques to Improve Explain the classification and prediction with their Mid-Term II, Quiz-3 & End
23 Flipped Class IT3240.3
Classification Accuracy algorithms. Term

Explain the classification and prediction with their Lecture, Practice Mid-Term II, Quiz-3 & End
24 Support Vector Machines, IT3240.3
algorithms. Questions Term

Explain the classification and prediction with their Lecture, Practice Mid-Term II, Quiz-3 & End
25 Random Forests IT3240.3
algorithms. Questions Term

Explain the classification and prediction with their Lecture, Practice Mid-Term II, Quiz-3 & End
26 Neural Networks IT3240.3
algorithms. Questions Term

Explain the classification and prediction with their Lecture, Practice Mid-Term II, Quiz-3 & End
27 Bayesian Belief Network IT3240.4
algorithms. Questions Term

6
Classification by Back Explain the classification and prediction with their Lecture, Practice Mid-Term II, Quiz-3 & End
28 IT3240.4
Propagation -I algorithms. Questions Term

Classification by Back Explain the classification and prediction with their Mid-Term II, Quiz-3 & End
29 Lecture IT3240.4
Propagation –II algorithms. Term

Explain the classification and prediction with their Mid-Term II, Quiz-3 & End
30 Lazy Learners Flipped Class IT3240.4
algorithms. Term

Specify the significance of clustering, its types, and Mid-Term II, Quiz-3 & End
31. Basic Concepts, Cluster Analysis Lecture IT3240.4
algorithms for mining. Term

Specify the significance of clustering, its types, and Lecture, Practice Mid-Term II, Quiz-3 & End
32. K-Means, Partitioning Methods IT3240.4
algorithms for mining. Questions Term

Specify the significance of clustering, its types, and Lecture, Practice Mid-Term II, Quiz-3 & End
33. Hierarchical Clustering IT3240.4
algorithms for mining. Questions Term

Specify the significance of clustering, its types, and Lecture, Practice Mid-Term II, Quiz-3 & End
34. Expectation Maximization IT3240.5
algorithms for mining. Questions Term

Density-based Clustering, Web Specify the significance of clustering, its types, and Mid-Term II, Quiz-3 & End
35. Lecture IT3240.5
Mining algorithms for mining. Term

Specify the significance of clustering, its types, and Lecture, Practice Mid-Term II, Quiz-3 & End
36. Text Mining, Spatial Mining IT3240.5
algorithms for mining. Questions Term

Case Study: Case Studies on


37. Various Data Mining Techniques Case studies Lecture IT3240.5 End Term
with Varying Data Sets.

7
Case Study: Case Studies on
38. Various Data Mining Techniques Case studies Lecture IT3240.5 End Term
with Varying Data Sets

J. Course Articulation Matrix: (Mapping of COs with POs)


CORRELATION
WITH PROGRAM-
CORRELATION WITH PROGRAM OUTCOMES
SPECIFIC
CO STATEMENT
OUTCOMES
PO 1 PO 2 PO 3 PO 4 PO 5 PO 6 PO 7 PO 8 PO 9 PO 10 PO 11 PO 12 PSO 1 PSO 2 PSO 3
Interpret the contribution of 3 2 2 3
IT data warehousing and data
3240.1 mining to the decision-support
level of organizations
Categorize and carefully 3 2 1 1 2 2 2
differentiate between situations
for applying different data-
IT mining techniques: frequent
3240.2 pattern mining, association,
correlation, classification,
prediction, and cluster and
outlier analysis
IT Design and implement systems 3 1 2 1 2 1 3
3240.3 for data mining
Evaluate the performance of 2 2 3 3 2
IT
different data-mining
3240.4
algorithms
Propose data-mining solutions 2 1 2 1 2
IT
for different applications
3240.5

1- Low Correlation; 2- Moderate Correlation; 3- Substantial Correlation

You might also like