You are on page 1of 33

KHAI THC D LIU

& NG DNG
GV: TS. NGUYN HONG T ANH

BI 1
TNG QUAN

NI DUNG
1. Ti sao cn khai thc d liu ?
2. Khai thc d liu (KTDL) l g ?
3. Qui trnh Khm ph tri thc (KDD)
4. Cc nhim v chnh ca KTDL
5. Cc k thut KTDL

6. Cc thch thc ca KTDL


3

Kha cnh thng mi


S CN THIT CA KTDL
Khi lng ln d liu

c thu thp v lu tr

Web data, e-commerce

Ha n mua hng ti siu th

/ trung tm mua sm
Giao dch ngn hng /
th tin dng

My tnh mnh hn, r hn


p lc cnh tranh rt mnh
o
Cung cp cc dch v a dng, cht lng tt ( CRM
Customer Relationship Management)
4

S CN THIT CA KTDL

Kha cnh Khoa hc


D liu c thu thp
v lu tr vi tc cao (GB/h)

o
o
o

Thit b remote sensor trn v tinh


Knh thin vn quan st bu tri
Microarray to d liu biu din gien
Th nghim khoa hc to hng TeraByte

Cc k thut truyn thng khng


kh nng lm vic vi d liu th
KTDL c th gip cc nh khoa hc

o
o

Phn loi v phn on d liu


Xy dng gi thuyt
5

S RA I CA KTDL
KTDL ra i trong bi
cnh : GIU DL

NGHO TRI THC


We are drowning in
data, but starving for
knowledge!
KTDL - gii php
gip phn tch t ng
cc ni DL v h tr ra
quyt nh .

S CN THIT CA KTDL
DL cha rt nhiu thng tin gi

tr, c li cho qui trnh ra quyt


nh
Khng th phn tch DL = tay

Con ngi cn hng tun l


khm ph ra thng tin c ch
Phn ln d liu cha bao gi
c phn tch c
H su gia kh nng sinh ra DL
v kh nng s dng DL
Usama Fayyad

106-1012 bytes:
Khong bao gi co
the nhn thay mot
cach ay u tap
d lieu hoac a
vao bo nh cua
may tnh

Evolution of Sciences:
New Data Science Era

Before 1600: Empirical science

1600-1950s: Theoretical science

Each discipline has grown a theoretical component. Theoretical models often


motivate experiments and generalize our understanding.

1950s-1990s: Computational science

Over the last 50 years, most disciplines have grown a third, computational branch
(e.g. empirical, theoretical, and computational ecology, or physics, or linguistics.)

Computational Science traditionally meant simulation. It grew out of our inability to


find closed-form solutions for complex mathematical models.

1990-now: Data science

The flood of data from new scientific instruments and simulations

The ability to economically store and manage petabytes of data online

The Internet and computing Grid that makes all these archives universally accessible

Scientific info. management, acquisition, organization, query, and visualization tasks


scale almost linearly with data volumes

Data mining is a major new challenge!

Jim Gray and Alex Szalay, The World Wide Telescope: An Archetype for Online Science,
8
Comm. ACM, 45(11): 50-54, Nov. 2002

S CN THIT CA KTDL
4,000,000
3,500,000

H su d liu

3,000,000
2,500,000
2,000,000

S DL thu thp (TeraB) t nm 1995

1,500,000
1,000,000

S DL c
phn tch

500,000
0
1995

1996

1997

1998

1999
9

S DNG KTDL KHI NO?


D liu qu nhiu
D liu ln (chiu v kch thc)
D liu nh ( kch thc)
D liu gene (s chiu)

C t tri thc v d liu


10

LNH VC NG DNG KTDL


Thong tin thng mai

Thong tin san xuat

-Phan tch th trng va


mua ban
-Phan tch au t
-Chap thuan cho vay
- ieu khien va len ke hoach
-Phat hien gian lan
- Quan tr mang

- Phan tch cac ket qua thc


nghiem

Thong tin ca nhan


Thong tin khoa hoc
- Thien van hoc
- C s d lieu sinh hoc
- Khoa hoc a chat: bo do tm ong
at

11

Customer Relationship Management (CRM)

Customer Relationship
Management (CRM)
xy dng mi quan h vi khch hng, cc cng
ty cn phi bit :
1.

Notice what its customers are doing

2.

Remember what it and its customers have


done over time

3.

Learn from what it has remembered

4.

Act On what it has learned to make customers


more profitable

Da trn cc d liu giao dch


(Transaction Data)

Da trn cc d liu giao dch


(Transaction Data)

Pht hin v nm gi mi quan


h l cha kho ca thnh cng

NI DUNG
1. Ti sao cn khai thc d liu ?

2. Khai thc d liu l g ?


3.
4.
5.
6.

Qui trnh KDD


Cc nhim v chnh ca KTDL
Cc k thut KTDL
Cc thch thc ca KTDL
17

TH NO L KTDL
Khai thc d liu l qu trnh khng tm thng ca vic xc
nh cc mu tim n c tnh hp l, mi l, c ch v c
th hiu c ti a trong CSDL U.Fayyad, (1996)
a x ly
Qua trnh khong tam thng
Hp le

Chng minh tnh ung


Cua mau / Mo hnh

Mi la

Khong biet trc

Co ch
Co the hieu c

Co the s dung c
Bi con ngi va may

18

KHAI THC DL

Th no l mu tim n ?

L mi quan h trong d liu v d nh :

Nhng ngi mua qun ty thng hay mua


thm o s mi.
Nhng ngi c mc tn dng tt th thng
t b tai nn.
n ng, 37+, thu nhp : 50K-75K, -> chi
khong 25$-50$ cho t mua hng qua
catalog.

19

KHAI THC DL ....


What is not Data
Mining?

What is Data Mining?

Tm s in thoi
trong danh b in
thoi.

Cc tn ph bin ti khu
vc xc nh ca M
(OBrien, ORurke,
OReilly vng Boston).

Tm thng tin v
Amazon
trn
serach engine.

Gom nhm cc ti liu


ging nhau thu c t
search engine da trn ni
dung (VD: rng nhit i
Amazon , Amazon.com).
20

10

NI DUNG
1. Ti sao cn khai thc d liu ?
2. Khai thc d liu l g ?

3. Qui trnh Khm ph tri thc

(KDD)
4. Cc nhim v chnh ca KTDL
5. Cc k thut KTDL
6. Cc thch thc ca KTDL
21

QUI TRNH KHM PH TRI THC

KTDL : Mt bc
quan trng trong qui
trnh KDD (knowledge
discovery in DB)

Pattern Evaluation

3
Data Mining

Task-relevant Data
Data Warehouse

Selection

Data Cleaning

1 Data Integration
Databases

22

11

QUI TRNH KDD

D lieu c to chc theo chc


nang
Tao ra/chon loc
CSDL ch

Data warehousing
1

Chon lla ky thuat


ien hnh va d lieu mau
Thay the nhng
gia tr thieu

Kh nhieu
D lieu

Chuan hoa
gia tr

Bien oi
gia tr

Tao cac thuoc


Tnh dan xuat

La chon
phng phap DM

Trch xuat
Tri thc

2
Tm thuoc tnh quan
trong &Mien gia tr

3
La chon
nhiem vu DM

Bien oi qua
bieu ien khac

Kiem tra
tri thc

Phat sinh ra cau hoi va bao cao


Cac phng phap cai tien
kieu ket hp va lap day
5

Tnh che
Tri thc

23

Example: A Web Mining Framework

Web mining usually involves

Data cleaning

Data integration from multiple sources

Warehousing the data

Data cube construction

Data selection for data mining

Data mining

Presentation of the mining results

Patterns and knowledge to be used or stored into


knowledge-base

24

12

KIN TRC H THNG KTDL


TIU BIU
Graphical user interface

Pattern evaluation
Data mining engine
Knowledge-base

Database or data
warehouse server
Filtering

Data cleaning & data integration

Data
Warehouse

Databases

25

Data Mining in Business Intelligence


Increasing potential
to support
business decisions

Decision
Making
Data Presentation
Visualization Techniques

End User

Business
Analyst

Data Mining
Information Discovery

Data
Analyst

Data Exploration
Statistical Summary, Querying, and Reporting
Data Preprocessing/Integration, Data Warehouses
Data Sources
Paper, Files, Web documents, Scientific experiments, Database Systems

DBA
26

13

KDD Process: A Typical View from ML


and Statistics
Input Data

Data PreProcessing

Data integration
Normalization
Feature selection
Dimension reduction

Data
Mining

PostProcessing

Pattern discovery
Association & correlation
Classification
Clustering
Outlier analysis

Pattern evaluation
Pattern selection
Pattern interpretation
Pattern visualization

This is a view from typical machine learning and statistics


communities

27

NI DUNG
1. Ti sao cn khai thc d liu ?
2. Khai thc d liu l g ?

3. Qui trnh khm ph tri thc (KDD)

4. Cc nhim v chnh ca KTDL


5. Cc k thut KTDL
6. Cc thch thc ca KTDL
28

14

Cc Nhim v/chc nng


CHNH CA KTDL

29

Nhim v/Chc nng Chnh

D on (Predictive) :

S dng mt vi bin d bo gi tr cha bit hoc


gi tr tng lai ca cc bin khc

Phn lp
Hi qui
Pht hin s thay i /lc hng
M t ( Descriptive) :

Xc nh cc mu m t DL m con ngi c th hiu


c

Gom cm
Tm tt
M hnh ha ph thuc
30

15

Nhim v/Chc nng Chnh


Pht hin ra m t ca mt
vi lp c xc nh
v phn loi d liu vo
mt trong cc lp .

Tm ra mt tp xc nh
Cc nhm hay cc cm
m t d liu

Gom cm

Phn lp
?

nh x t mt mu d liu
thnh mt bin d on
trc c gi tr thc .

Pht hin ra mt m
hnh m m t ph
thuc quan trng nht
gia cc bin

M hnh ha
ph thuc

Hi qui
Pht hin ra nhng thay i
quan trng nht
trong d liu

Pht hin s thay


i/lc hng

Pht hin ra mt m t
tm tt cho mt
tp con d liu

Tm tt

31

Time and Ordering: Sequential Pattern,


Trend and Evolution Analysis

Sequence, trend and evolution analysis

Trend, time-series, and deviation analysis: e.g.,


regression and value prediction
Sequential pattern mining

Periodicity analysis
Motifs and biological sequence analysis

e.g., first buy digital camera, then buy large SD memory


cards

Approximate and consecutive motifs

Similarity-based analysis

Mining data streams

32

Ordered, time-varying, potentially infinite, data streams

16

Structure and Network Analysis

Graph mining
Finding frequent subgraphs (e.g., chemical compounds), trees
(XML), substructures (web fragments)
Information network analysis
Social networks: actors (objects, nodes) and relationships (edges)
e.g., author networks in CS, terrorist networks
Multiple heterogeneous networks
A person could be multiple information networks: friends, family,
classmates,
Links carry a lot of semantic information: Link mining
Web mining
Web is a big information network: from PageRank to Google
Analysis of Web information networks
Web community discovery, opinion mining, usage mining,
33

Evaluation of Knowledge

Are all mined knowledge interesting?

One can mine tremendous amount of patterns and knowledge

Some may fit only certain dimension space (time, location, )

Some may not be representative, may be transient,

Evaluation of mined knowledge directly mine only


interesting knowledge?

Descriptive vs. predictive

Coverage

Typicality vs. novelty

Accuracy

Timeliness

34

17

V D PHN LP

Cng ty Verizon Wireless:


Cng ty cung cp thit b, dch v khng dy ln
nht M. www.verizonwireless.com
S lng khch hng: 65.7 triu (cui nm 2007)
Thu nhp hng nm: 43.9 t $
Vn :
T l khch hng b mt cao: 2%/thng (1,300,000
khch hng ri b/thng)
Chi ph thay th: hng trm triu $/nm
Chi ph trung bnh cho mi khch hng mi: 320$
35

V D PHN LP

Gii php thng thng :


Cho mi, khuyn mi tt c khch hng trc khi ht hp ng
Ch ph qu tn km, lng ph.

Gii php ca KTDL:

Xy dng m hnh d on

Dng m hnh d on xc nh cc khch hng c


kh nng ri b.

Sau :
Khuyn mi, cho mi (VD: mt in thoi mi) cho
nhng khch hng c nhiu kh nng ri b nht.
Pht trin k hach mi nhm p ng nhu cu ca khch
hng.
Kt qu: gim t l mt khch hng di 1.5 %/ thng.

18

V D PHN LP
Model/Pattern

Training Data:
Customer characteristics &
cell phone usage behavior

The model is used to infer the probability a customer would leave

Model

Consumer i

Probability
customer
would
terminate
contract
37

PHN LP: NG DNG 1

Pht hin gian ln:


Mc ch: D on cc trng hp gian ln trong giao dch
th tn dng.

Hng gii quyt:

Dng cc giao dch th tn dng v thng tin ca ch


th nh thuc tnh.
Khch hng mua ci g, lc no, s ln dng th.

Gn nhn giao dch c l gian ln hay hp l, ng - to


thnh thuc tnh lp.
Xy dng m hnh cho lp cc giao dch.
Dng m hnh khm ph gian ln trn cc giao dch th
tn dng .

38

19

PHN LP: NG DNG 2

Qung co:
Mc ch: Gim ch ph th tn bng cch tp trung vo
nhm khch hng c nhiu kh nng mua sn phm in
thoi di ng mi.

Hng gii quyt:

S dng d liu cho sn phm tng t trc y.


Dng quyt nh {mua, khng mua} lm thuc tnh lp.
Thu thp thng tin c nhn, cch sng v quan h ca tt
c cc khch hng.
Dng cc thng tin trn nh l d liu u vo xy
dng m hnh phn lp.
39

GOM Nhm DL
Gom cm/ Gom nhm da trn khong cch Euclide
trong khng gian 3-D
Intracluster distances
are minimized

Intercluster distances
are maximized

40

20

GOM nhm: NG DNG 1

Gom nhm khch hng:

Mc ch: Chia khch hng thnh cc nhm/cm ring bit


c th p dng cc bin php qung co khc nhau.

Hng gii quyt:


Thu thp thng tin c nhn, cch sng ca tt c cc
khch hng.
Xc nh cc cm/nhm khch hng ging nhau.
Kim tra cht lng ca cc cm thng qua vic quan
st c trng mua hng ca khch hng trong cng
mt cm so vi khch hng khc cm.

41

GOM nhm: NG DNG 2

Gom nhm ti liu:


Mc ch: Tm nhm ti liu ging nhau da trn cc t
quan trng.

Hng gii quyt:


Xc nh ph bin ca t trong ti liu. Xy dng
o tng t da trn ph bin ca cc t gom
cm.
Li ch: Trong lnh vc truy vn thng tin (IR), c th
dng cc cm lin kt ti liu mi vi cc ti liu
gom cm.

42

21

Gom nhm DL c phiu S&P 500


Quan st s bin ng ca gi c phiu hng ngy
D liu : C phiu {UP/DOWN}
o tng t: cc s kin thng ging nhau trong
cng mt ngy
Discovered Clusters

1
2
3
4

Industry Group

Applied-Matl-DOW N,Bay-Net work-Down,3-COM-DOWN,


Cabletron-Sys-DOWN,CISCO-DOWN,HP-DOWN,
DSC-Co mm-DOW N,INTEL-DOWN,LSI-Logic-DOWN,
Micron-Tech-DOWN,Texas-Inst-Down,Tellabs-Inc-Down,
Natl-Semiconduct-DOWN,Oracl-DOWN,SGI-DOW N,
Sun-DOW N
Apple-Co mp-DOW N,Autodesk-DOWN,DEC-DOWN,
ADV-M icro-Device-DOWN,Andrew-Corp-DOWN,
Co mputer-Assoc-DOWN,Circuit-City-DOWN,
Co mpaq-DOWN, EM C-Corp-DOWN, Gen-Inst-DOWN,
Motorola-DOW N,Microsoft-DOWN,Scientific-Atl-DOWN
Fannie-Mae-DOWN,Fed-Ho me-Loan-DOW N,
MBNA-Corp -DOWN,Morgan-Stanley-DOWN
Baker-Hughes-UP,Dresser-Inds-UP,Halliburton-HLD-UP,
Louisiana-Land-UP,Phillips-Petro-UP,Unocal-UP,
Schlu mberger-UP

Technology1-DOWN

Technology2-DOWN

Financial-DOWN
Oil-UP

43

KHAI THC LUT KT HP


Transaction-id

Items bought

10

A, B, C

20

A, C

30

A, D

40

B, E, F
Customer
buys both

Customer
buys diaper

Itemset X={x1, , xk}


Tm mi quan h gia
cc thuc tnh thng
xut hin ng thi
A C (50%, 66.7%)
C A (50%, 100%)

Buy diapers
on
Friday night

Then
Buy beer

Customer
buys beer
44

22

Khai thc LKH: NG DNG 1

Qun l quy hng siu th:


Mc ch: Xc nh nhng mt hng c nhiu
khch hng mua chung

Hng gii quyt:


X

l d liu bn hng tm mi lin h


gia cc mt hng
Lut c in: Nu khch hng mua t giy v
sa th c kh nng mua bia.
45

Khai thc LKH: NG DNG 2

Qun l hng ha:


Mc ch: Cng ty bo tr thit b tiu dng mun
on trc nguyn nhn sa cha cc sn phm tiu
dng v trang b cc xe bo tr cc b phn cn thit
gim thiu s ln n nh khch hng

Hng gii quyt:


X

l d liu trn cc dng c v b phn


yu cu trong cc ln sa trc tm cc mu
ng xut hin.
46

23

HI QUI
D on gi tr ca bn da trn gi tr ca
cc bin khc
V d:
D bo khi lng bn hng ca sn phm
mi da trn chi ph qung co.
D on tc gi nh mt hm ca nhit ,
m, p sut khng kh,
D on ch s th trng chng khon.

47

Pht hin s Lc hng/


Bt bnh thng

Xc nh s lch hng r
rt so vi hnh vi thng
thng
ng dng:
Pht hin gian ln
th tn dng
Pht hin xm
nhp mng tri php

48

24

Applications of Data Mining

Web page analysis: from web page classification, clustering


to PageRank & HITS algorithms

Collaborative analysis & recommender systems

Basket data analysis to targeted marketing

Biological and medical data analysis: classification, cluster


analysis (microarray data analysis), biological sequence
analysis, biological network analysis

Data mining and software engineering (e.g., IEEE Computer,


Aug. 2009 issue)

From major dedicated data mining systems/tools (e.g., SAS,


MS SQL-Server Analysis Manager, Oracle Data Mining
49
Tools) to invisible data mining

NI DUNG
1. Ti sao cn khai thc d liu ?
2. Khai thc d liu l g ?
3. Qui trnh Khm ph tri thc (KDD)
4. Cc nhim v chnh ca KTDL

5. Cc k thut KTDL
6. Cc thch thc ca KTDL
50

25

CC K THUT KTDL
KTDL ly tng t cc lnh vc nh
my hc, thng k, nhn dng, h thng
DL
Cc k thut truyn thng c th khng
ph hp do:

Kch thc ln ca DL
S chiu DL ln
Bn cht DL khng ng nht
51

KTDL KT HP CC PHNG PHP


Machine
Learning

Applications

Algorithm

Pattern
Recognition

Data Mining

Database
Technology

Statistics

Visualization

High-Performance
Computing

52

26

NI DUNG
1. Ti sao cn khai thc d liu (DM) ?
2. DM l g ?

3. Qui trnh KDD


4. Cc nhim v chnh ca KTDL
5. Cc k thut KTDL

6. Cc thch thc ca KTDL


53

CC THCH THC CA KTDL


Ngun: http://www.cs.uvm.edu/~icdm/10Problems/index.shtml :
2005-2006 ca ICDM

Developing a Unifying Theory of Data Mining

Scaling Up for High Dimensional Data and High Speed Data


Streams

Mining Sequence Data and Time Series Data

Mining Complex Knowledge from Complex Data

Data Mining in a Network Setting

Distributed Data Mining and Mining Multi-agent Data

Data Mining for Biological and Environmental Problems

Data-Mining-Process Related Problems

Security, Privacy and Data Integrity

Dealing with Non-static, Unbalanced and Cost-sensitive Data


54

27

CC THCH THC CA KTDL


Theo J. Han (2013):

Mining social and information networks

Mining spatiotemporal data, moving object data & cyberphysical systems

Mining multimedia, social media, text and Web

Data software engineering and computer system data

Multidimensional online analytical analysis

Pattern mining, pattern usage, and pattern understanding

Biological data mining

Stream data mining


55

TI SAO CN NGHIN CU KTDL

Cc nhm tho lun v t


a ra cu tr li.

56

28

TM TT
Khm ph mu c ch, cha bit t khi

lng ln DL
Qui trnh khm ph tri thc (KDD)
Thu thp v tin x l DL -> KTDL -> nh
gi mu -> Biu din tri thc

Khai thc trn nhiu loi DL, thng tin

Cc loi mu cn khai thc


Lut kt hp, mu tun t, phn lp, gom
nhm, mu him, mu c bit, sai lch
57

S pht trin ca KTDL


1989 IJCAI Workshop on Knowledge Discovery in Databases
Knowledge Discovery in Databases (G. Piatetsky-Shapiro and W.
Frawley, 1991)
1991-1994 Workshops on Knowledge Discovery in Databases
Advances in Knowledge Discovery and Data Mining (U. Fayyad,
G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, 1996)
1995-1998 International Conferences on Knowledge Discovery in
Databases and Data Mining (KDD95-98)
Journal of Data Mining and Knowledge Discovery (1997)
ACM SIGKDD conferences t 1998 v SIGKDD Explorations
Nhiu hi ngh khc v KTDL
PAKDD (1997), PKDD (1997), SIAM-Data Mining (2001),
(IEEE) ICDM (2001), WSDM (2008),
ACM Transactions on KDD t 2007
58

29

Conferences and Journals on Data Mining

KDD Conferences
ACM SIGKDD Int. Conf. on
Knowledge Discovery in
Databases and Data Mining (KDD)
SIAM Data Mining Conf. (SDM)
(IEEE) Int. Conf. on Data Mining
(ICDM)
European Conf. on Machine
Learning and Principles and
practices of Knowledge Discovery
and Data Mining (ECML-PKDD)
Pacific-Asia Conf. on Knowledge
Discovery and Data Mining
(PAKDD)
Int. Conf. on Web Search and
Data Mining (WSDM)

Other related conferences

DB conferences: ACM SIGMOD,


VLDB, ICDE, EDBT, ICDT,
Web and IR conferences: WWW,
SIGIR, WSDM

ML conferences: ICML, NIPS

PR conferences: CVPR,

Journals

Data Mining and Knowledge


Discovery (DAMI or DMKD)
IEEE Trans. On Knowledge and
Data Eng. (TKDE)

KDD Explorations

ACM Trans. on KDD


59

Tm ti liu u?
DBLP, CiteSeer, Google

Data mining and KDD (SIGKDD: CDROM)

Database systems (SIGMOD: ACM SIGMOD AnthologyCD ROM)

Conferences: SIGIR, WWW, CIKM, etc.


Journals: WWW: Internet and Web Information Systems,

Statistics

Conferences: Machine learning (ML), AAAI, IJCAI, COLT (Learning Theory), CVPR, NIPS, etc.
Journals: Machine Learning, Artificial Intelligence, Knowledge and Information Systems, IEEE-PAMI,
etc.

Web and IR

Conferences: ACM-SIGMOD, ACM-PODS, VLDB, IEEE-ICDE, EDBT, ICDT, DASFAA


Journals: IEEE-TKDE, ACM-TODS/TOIS, JIIS, J. ACM, VLDB J., Info. Sys., etc.

AI & Machine Learning

Conferences: ACM-SIGKDD, IEEE-ICDM, SIAM-DM, PKDD, PAKDD, etc.


Journal: Data Mining and Knowledge Discovery, KDD Explorations, ACM TKDD

Conferences: Joint Stat. Meeting, etc.


Journals: Annals of statistics, etc.

Visualization

Conference proceedings: CHI, ACM-SIGGraph, etc.


Journals: IEEE Trans. visualization and computer graphics, etc.

60

30

Bi tp theo nhm s 1

Thi gian tho lun: 15

Tho lun tnh hung KTDL trong nhm v 01 ngi i din cho
nhm trnh by.
Thi gian trnh by: ti a 3 .
Trnh by tnh hung
Hng gii quyt v li ch

Tnh hung 1: Th trng bn l (v d cn tng doanh


thu bn hng)

Gi :
Dng DL no c thu thp. S dng nhim v no ca KTDL ?
Cc thng tin no ta cn bit v khch hng
C cn bit khch hng mua cc mt hng g?
C cn phn loi khch hng ?,
61

Bi tp theo nhm s 1

Thi gian: 15

Tho lun tnh hung KTDL trong nhm v 01 ngi i din cho
nhm trnh by
Thi gian trnh by: ti a 3
Trnh by tnh hung
Hng gii quyt v li ch
Tnh hung 2: Qung co sn phm (v d chn la hnh thc,
i tng qung co gim chi ph, tng li nhun)
Gi :
DL cn thu thp l g. S dng nhim v no ca KTDL ?
C cn thit gi t qung co sn phm n tt c cc khch
hng Hay ch gi cho 1 nhm c chn lc.
C th d kin kh nng phn hi ca khch hng so vi chi ph
gi qung co ?
62

31

CC CNG VIC CN LM
Np kt qu bi tp nhm 1 ca nhm
theo link ca trang Moodle trong
vong 01 tun t ngy hm nay.
Ni dung trnh by:
Mc tiu, yu cu c th (sn phm c th)
D liu cn thit: VD: DL sn phm, Khch hng, i
th, cc kt qu trc lin quan n yu cu ca bi
ton.
S dng chc nng no ca KTDL: m t, d on, hay
kt hp,
Kt qu t c d kin.

63

TI LIU THAM KHO

G. Piatetsky-Shapiro, U. Fayyad, and P. Smith. From data


mining to knowledge discovery: An overview. U.M.
Fayyad, et al. (eds.), Advances in Knowledge Discovery
and Data Mining, 1-35. AAAI/MIT Press, 1996
http://vi.wikipedia.org/wiki/Khai_ph%C3%A1_d%E1
%BB%AF_li%E1%BB%87u: bch khoa ton th m
wikipedia
J.Han, M.Kamber, Chng 1 Data mining :
Concepts and Techniques
P.-N. Tan, M. Steinbach, V. Kumar, Chng 1 Introduction to Data Mining
64

32

BI TP
Th no l khai thc d liu ? Cho v d minh
ha.
2. Cc kiu d liu, thng tin no c kh nng c
s dng trong qui trnh KDD?
3. Cho v d thc t v vic p dng KTDL em n
thnh cng trong kinh doanh (ngoi cc v d c
trong bi ging).
1.

Gi : Bi ton tng doanh thu ca th trng bn l.


Bi ton xy dng k hoch qung co v khuyn mi
Loi DL no c thu thp ? Loi nhim v no ca
KTDL c s dng ? C th thay bng phng php
truy vn DL hay phn tch thng k n gin khng ?
65

66

33

You might also like