Professional Documents
Culture Documents
Introduction to DM and Unstructured data: data that does not t a certain data structure (text, a list of numeric
measurements)
Structured data: data that fits a certain data structure (table, tree, graph/network, etc.)
Data Warehousing • “Data mining is the process of discovering meaningful new correlations,
patterns and trends by sifting through large amounts of data stored in
repositories, using pattern recognition technologies as well as statistical and
mathematical techniques.” (The Gartner Group, www.gartner.com)
Overview of the Course • “Data mining is the analysis of (often large) observational data sets to find
unsuspected relationships and to summarize the data in novel ways that are
both understandable and useful to the data owner”. (David Hand, Heikki
Mannila, and Padhraic Smyth, Principles of Data Mining, MIT Press, Cambridge,
MA, 2001.)
• Process Mining is the task of converting event data into process models.
1 2
• Knowledge Discovery in Data is the non-trivial process of identifying • The DM process must address:
valid,
novel, • Enormity of data
potentially useful
and ultimately understandable patterns in data. • High dimensionality of data
• Process Mining is the task of converting event data into process • Heterogeneous and distributed nature of data.
models.
3
1
5/25/2020
5 6
7 8
2
5/25/2020
9 10
11 12
3
5/25/2020
13 14
15 16
4
How much data do we generate?
19
17
20
18
5
5/25/2020
5/25/2020
21 22
23 24
6
5/25/2020
Managing
KK1 Organizations
Vision
Mission
Values, Purpose, Structure, Politics, Environment, etc.
Strategic Givens
Direction
Policies, Goals, and Objectives
Decision What should be done ?
Making
Analytics, Decision Making
When and how ??
Implementation
Project Management
Action
25
DATA MODEL
BASE BASE
Enterprise Application
INTELLIGENCE MODELS Data Models
DATA DBMS MBMS
Structuring Relationships
DESIGN Problem Representation DATA ON LINE ANALYTICAL
Variables (Measures and Generation of Alternatives WAREHOUSING
Estimates) PROCESSING
Probabilities and
Estimates
CHOICE
Spreadsheet Models
Decision Analysis and
Influence Diagrams for for managing complex Business Reporting
Visualizing Models and relationships and detail
Choices
7
Slide 25
Goals/Strategy
Pricing
Promotion Marketing Demand Consumers
Loyalty
Capacity
Labor Production Quantity Suppliers
Materials
Cash flow
Finance Revenues Investors
Debt/Equity
Investments
30
31 32
8
5/25/2020
Why DM?
• Data explosion • Data Information Knowledge
• We are drowning in data, but
starving for knowledge!" • Knowledge Discovery
• Interpretation
• Machine Learning
• Understanding
• Learning
• Data Mining
• Acting
• Descriptive data mining:
clustering, pattern mining, etc.
• Predictive data mining:
classification, prediction, etc.
33 34
35 36
9
5/25/2020
37 38
39 40
10
5/25/2020
41 42
44
11
5/25/2020
45 46
47
12
5/25/2020
52
13
5/25/2020
Data Warehouse
53
Benefits of a Data Mart (contd…) Operational Source Systemsand Data Staging Area
• Operational Source Systems
• capture the transactions of the business
• queries against source systems are narrow
• stovepipe application
14
5/25/2020
60
15
5/25/2020
61 62
63
16
5/25/2020
• Data selection, where data relevant to the analysis task are retrieved from the
database.
17
5/25/2020
• Data quality:
Accuracy
Completeness
Consistency (uniformity)
Validity
Timeliness
Data cleaning, data cleansing, data scrubbing,
71 72
18
5/25/2020
73 74
75 76
19