You are on page 1of 7

21-09-2020

Government of Goa
Goa College Of Engineering
Farmagudi - Goa

DATA MINING
INTRODUCTION:
Challenges, Origin, Architecture And
Tasks

Prof. Sherica Lavinia Menezes


Assistant Professor
Computer Engineering Department
Goa College of Engineering

▪ Learning Objectives
▪ Motivating Challenges
▪ Origin of Data Mining
▪ Architecture of Data Mining

21-09-2020 2

1
21-09-2020

At the end of this student will be able to:


▪ Explain Architecture of Data Mining
▪ Discuss Challenges in Data Mining
▪ Explain the origin of Data Mining

21-09-2020 3

CHALLENGES
IN DATA
MINING

DATA
OWNERSHIP AND
DISTRIBUTION

21-09-2020 4

2
21-09-2020

▪ Data Sets are of sizes in GB, TB and petabytes


▪ Data Mining Algorithms must be scalable
▪ Scalability needs:
 Special Search Strategies
 Novel Data Structures for efficient access
 Parallel and distributed Algorithms

21-09-2020 5

▪ Data sets with hundreds and thousands of


attributes
▪ Eg.: Genetic data, temporal/spatio-temporal
data
▪ Measurements of temperature at various
temperature
locations
time

(x,y) location
▪ Computational Complexity increases rapidly as
the dimensionality increases
21-09-2020 6

3
21-09-2020

▪ Traditional data contains attributes of same


type: Continuous or Categorical
▪ Current data sets have heterogeneous attributes
▪ Complex data:
 Webpages with semi structures text and hyperlinks
 DNA data with sequential and 3D Structure
 Climate Data
▪ Mining Techniques should take into account
relationships in data:
 Temporal and spatial correlation
 Graph Connectivity

21-09-2020 7

▪ Data needed for analysis might not be in one


place or owned by one organization
▪ Data:
 Geographically distributed
 Belong to multiple owners
▪ Distributed Data Mining:
 Reduce amount of communication for distributed
computations
 Effectively consolidate data mining results from
multiple sources
 Address data security issues

21-09-2020 8

4
21-09-2020

▪ Traditional approach: Hypothesize-and-Test


Data Analyzed with
Hypothesis proposedExperiment Designed Data Gathered respect to hypothesis

▪ Current Data Analysis:


 Generation of thousands of hypotheses
 Automate the process of hypotheses generation
and evaluation
▪ Data sets represent opportunistic samples of
data
21-09-2020 9

Search algorithms, modeling


Sampling, Estimation Hypotheses
technology, learning theories

A.I., MACHINE
LEARNING &
PATTERN
STATISTICS RECOGNITION
DATA
MINING
Database Technology, Parallel Computing,
Distributed Computing

21-09-2020 10

10

5
21-09-2020

User Interface

Pattern Evaluation
Knowledge
base Data Mining Engine

Database or Data
Warehouse server

Data cleaning, integration and selection

Data World Other info


database
Warehouse Wide Web repositories

21-09-2020 11

11

Summarize

▪ Take a min to think over what we covered


today:
 Definition
 Applications
 Evolution
 KDD
 Origin
 Architecture

21-09-2020 12

12

6
21-09-2020

References

1. Introduction to Data Mining by Pang-Ning Tan, Vipin


Kumar, Michael Steinbach
2. Data Mining: Concepts and Techniques by Jiawei Han
and Michelin Kamber

21-09-2020 13

13

Quiz Assignment

▪ On Completion of this video students are


required to answer the quiz assignment
posted on Google Classroom.

21-09-2020 14

14

You might also like