You are on page 1of 2

Assignment-1

Unit-I: Introduction to Data Mining

1. What is data mining? Describe the steps involved in data mining when viewed as a
process of knowledge discovery.
2. Understanding types of the databases. Identify the challenges associated and the data
mining functionalities that can be applied to them.
3. Define each of the following data mining functionalities: characterization,
discrimination, association and correlation analysis, classification, prediction,
clustering, and evolution analysis. Give examples of each data mining functionality,
using a real-life database that you are familiar with.

Unit-II : Knowing Data and Data Preprocessing

1. Tale a real world scenario (Healthcare, Education, Sales, etc.). Identify the process of
data collection from appropriate sources and write the data set description.
2. Data Cleaning:
(a)List the methods to handle missing values.
(b) Consider the dataset D={12,14,3,23,16,7,8,4,11,10,20,5}, and perform smoothing
using binning methods: (1) Smoothing by Bin Boundaries and (2) Smoothing by Bin
Means. Take bin size as 3.
(c) List other methods to handle noisy data.
3. Data Integration:
(a) With a suitable example, differentiate between Schema and Instance Integration.
(b) List the data value conflicts faced during data integration and resolve them.
4. Data Reduction and Transformation:
(a) Generate a concept hierarchy for home location.
(b) Perform data aggregation on an education dataset.

Unit-III : Data Warehousing and Data Cube Technology


1. Download any sample data set and identify various types of attributes in it.
2. For a typical data warehouse design process choose
a. A business process to model, e.g., orders, invoices, etc.
b. The grain (atomic level of data) of the business process
c. The dimensions that will apply to each fact table record
d. The measure that will populate each fact table record
3. Design a data cube choosing a fact, and its dimensions. Draw a star and snowflake
schema for the cube.

You might also like