You are on page 1of 4

METU

DEPARTMENT OF COMPUTER ENGINEERING

1. Course Code and Title:


414 / Special Topics in CENG: Introduction to Data Mining

2. Credit hours: 3
Theoretical: 3 ECTS: 6
Applied/Laboratory: 0
Total: 3

3. Catalogue description:
Concepts of data mining; data preprocessing; data warehousing and OLAP for data
mining; association, correlation, and frequent pattern analysis; classification;
cluster and outlier analysis; mining time-series and sequence data; text mining and
web mining; visual data mining; industry efforts and social impacts, applications
of data mining.

Prerequisites: Ceng222; Math260

4. Course objectives/goals:
At the end of the course students will be able to have an understanding of data
mining techniques, and can apply them to real-life problems.

Related Program Educational Objectives are:

1. design, construct and operate software-intensive systems.


2. analyze problems from a computational viewpoint, propose algorithmic
solutions, and implement them correctly and efficiently.

5. Justification of the proposal:


The course will be offered as a technical elective course. Although there is already
a graduate course on Data Mining in our department, this course will complement
these courses by providing the foundations of data mining and serve as an
introductory basis for the advanced data mining topics in the graduate courses.

The Data Mining Curriculum Proposal prepared by ACM SIGKDD Curriculum


Committee states that ‘Data mining, the science of extracting useful knowledge
from such huge data repositories, has emerged as a young and interdisciplinary
field in computer science. Data mining techniques have been widely applied to
problems in industry, science, engineering and government, and it is widely
believed that data mining will have profound impact on our society. The growing
consensus that data mining can bring real value has led to an explosion in demand
for novel data mining technologies and for students who are trained in data
mining.’
Student Outcomes:
a. an ability to apply knowledge of mathematics, science, and engineering
b. an ability to design and conduct experiments, as well as to analyze and
interpret data

6. Faculty member submitting the proposal: Dr. Ayşe Nur Birtürk.

7. Relationships (including overlaps) with other undergraduate and graduate


courses in the Department, Faculty, and University:
Several items in the course schedule overlap with the graduate courses CENG514
Data Mining (50%), CENG 770 Advanced Data Mining (20%), CENG 562 Machine
Learning (15%), CENG 482 Introduction to Machine Learning (15%), IE 4903
Introduction to Data Mining (40%).

8. Textbook(s) and reference material(s) as two separate lists:

TEXT BOOK TITLE AUTHOR YEAR ISBN


• Introduction to Data Mining, Pang-Ning Tan, Michael Steinbach, and Vipin Kumar, Pearson
Addison Wesley, 2006, 0321321367.

REFERENCE MATERIAL TITLE AUTHOR YEAR ISBN


• Data Mining: Concepts and
Techniques, Jiawei Han and Micheline
Kamber, Morgan Kaufmann Publishers, 2012,
0123814790.
• Data Mining Introductory and
Advanced Topics, M. Dunham, Prentice Hall,
2003, 0130888923.
• Data Mining Practical Machine
Learning Tools and Techniques, Witten and
Frank, Morgan Kaufmann, 2005, 0120884070.

And lecture notes.


9. Syllabus (subject to change):

WEEKS SUBJECT DETAILS

1 Introduction Basic concepts of data mining, Applications of data


mining.
2 Data Preprocessing Descriptive data summarization, Data cleaning
methods, Data integration and transformation
methods, Basic data reduction methods,
Discretization and concept hierarchy generation.

3 Data Warehousing and OLAP for Data Concept and architecture of data warehouse, The
Mining; dimensional data model, OLAP Operations.
4 Association, correlation Basic concepts. Association Rule Mining. Market
Basket Analysis.

5 Frequent pattern analysis; Frequent pattern mining methods, Mining various


kinds of frequent patterns, Applications of
association rules.

6 Classification; Basic concepts. Evaluation of classification.


Bayesian Classification.
Decision tree and decision rule induction.
Linear models for classification.

7 Classification Basic concepts of nonlinear classification,


Classification by lazy evaluation, Ensemble
classifier.

8 Cluster Analysis; Concept of cluster analysis


Types of data and for dissimilarity computation
A categorization of major clustering methods

9 Clustering Methods and Outlier Partition-based clustering; Hierarchical clustering;


Analysis Density-based clustering; Model-based clustering;
Outlier analysis.

10 Mining Time-Series and Sequence Data Regression Analysis. Trend Analysis. Sequential
Pattern Mining.

11 Text Mining Mining text databases

12 Web Mining Mining World-Wide Web

13 Visual Data Mining; Data visualization, Visualization of data mining


results, Visual data mining:

14 Industry efforts and social impacts; Social impact of data mining, Data mining and
privacy, Standardization efforts, Data mining
system products
10. Grading system(subject to change):

Homeworks*: 25 %
Midterm 1: 20 %
Midterm2: 20 %
Final: 35 %
Attendance: bonus
*3-4 Homeworks (written hw(s), programming hw(s), usage of free DM
tools/environments)

11. Maximum class size and student quota for students of other departments:
Maximum Class Size: Quota for Students of Other Departments:
35 5

12. Proposed semester for the course: Fall: Spring:


13. Communication: All announcements including homework assignments, will be
made in the courses newsgroup metu.ceng.course.414.
14. Policy:
The homework assignments that are designated as individual assignments must
be completed individually. Copying from others, either from other students or
or the internet is strictly forbidden and will surely constitute grounds for
failure and the case will be passed to the Student Discipline Committee.