You are on page 1of 11

DM Assigmment- 1

Pant A:

1 Tastacting knowledqe lange amount of dota is called


hom

AWaehousing Data Mining cDatmbase clustea


Taeatinq imcoMe t en missing data is called
t) Selection B Preproessing
e) Tnanstanmatien
iS the outpui of kop
b)1tegration
LA
A) Que P)Useful infotmatien c) Data
A
DIfemtin
Data Pref>ocessina includes data cdeanin,
omd eductien methods integration, tansfvtmatien
5 Concept kiesanky fevel concept wm
highen levels concepk

Pasnt-
1 aDefine Data Minin

pataMnina is nesEnned to the eahackon of aseful


nfematen faom a bue ef dota en data wanehouse

The esult of datamining is the pabtans and


Enowledae hat e gain at he end ef eathactien.
Puocess It is also brown as Rmo wledge
discove
eknowledje enthachom
Data Min ina 1efe is te the ron
- hivial eahaction
ompliel, Ptevously Vhknown ad
fotentia lly
Useful nfoumation om data in he deta
Daba Minin Com ptses of 3 rmain phaes

O Data fiepnoceseing Includes dato. cleani, Intepatien,


Selec tien and Toiansfomation takes placa

Data Ethaction:
Occunence of e act DataMming
and meaentatien
and Porege ntntien:
etIg
Data Evaluatien
neults

the metods ef illing nissing valu


)Lis4
e mising dat, which is net so easy
A Manual entay :

einfeastble
ama Veny tedouS.

Vse Atibute Mean Avenage valae o the aeat the


vas
attnibutes is aoing o be aeplaced byissin

value: which can be found bu


USina wost psiobable
ett
cetain algithms like decibien hees, negnerim

with help pediction


fn Re fern of onkown f A
0sing lobal omstunt
ndinitt
ene is t be
In ohich *he missing
Lgnee Tuple ae ca
like in the huge da tnset
isnered simply
the entiae huple
affered to igmere

Pot-
2Discuss dat mimlng sa stf in kouwledqe discove
7
and van ious ehallenges asepted
Pnoeess
an

ierative proces and


consistt es
AKDD is an
e att ve sequenc e of s*
Data Mining ple nole in he
knOudedge di'xoVe
essentia

Kowledg

Patteem Evalutien

babaMining

1
neleyant
data selecio n and
THansfoRMatrem.

Dat Data Intapietatie


cleani3

Da tabases.

Data Inteqation
and cleaming

put ecdy

t
to he
Datr
mined
Frles
aehouue

DatdSelectn tnaly se
the
Sonca
and Tidk matien usudata

Date Mining Daba vauurtov


Dach pep0 cess
Data cleaminq Also Rnown a Data. cletunaing e Data
Soubbimg Dota
eleanin2 ts defvned as the procc
emova mosy and iNaeleyant
daba faom collecktor
6henee moise is a nandom en
Vosiúance enOn The
data lean ia is also Said to be as tha of
Plocese
detectinga amd
coect eonupt er imacc urae neconds
fhom data

Dan Integratien : Process Compning on menging data

hetenogeneous data
so unces into an Unifiad
fAOm nultip le

data sueh as Data Wasehouse t coplete date intenaton

Sotution delvenS husted data fuom Vauie ty of souwra,

This Dada Inteanation is done usna Dak Migratien


too Data 5ynehnon+satien tools and Eahactea
hansfermatien Load paoos

date, tiis
taste, ele vant
In

Data Selectien: selechng is decided and


to he analusis
data nelevant
step, collechien en
databajt
Metaieyed hom the data

Deta Onsoidated n
known as
Data Tuansferm-en o
ems
hansfenmed into
ehih the Seleeted data n
poc eduse bfe fe ming
mming
O P a t e f o the
aegatien opeatiena
Summa

a steps
igni nq ment fhom soua ba to
O Dafa Magpin
destinaben to captoe havefohwmattey

hansfematan
2 Code ene tion: Cone utien ofactual
Po m
inehd
Data Mining The
intelliaent methods ane applied to
usetul.
entnact data patHenn potevntially
nelevant data
into pattens
anefeums tnsk

Decides punpoe model using elaaific ation esu


charactexizahion

To mine data n exhact intor ma tion the funchenal


tke sumMaaization, laifieatien, Reghedi en, Cutli
tlien
Patten Eraluation : Tt s deined as identfying shictl

imcseasing patesns epneenting knouwledge based n


3iven meas uoe3. AL the discoven huge cumount o
datt a interesled Thus intenesting measunes Such
as uppent condence etc use d fov d ing enlg
inteaested pattess

Knowledse Presentution Discoveed kmowledge presentd

visual~atien diecovened enowledge


in
o pie fom

c a 0 s s tabs
ban chants, cunves, plots, tables,
chant3,
vieuali~atn
ther. This essential step whih utilises
t ne3u ks
tools sepsresent data mining
-

These ae majon chalengey whieh stands as issue


edata mining
:Minin4 methodoloqy ond USeN Tntna ctren
Tenf'mance lssuay
3 Divese Dototuee issues.
Minin Methodoloau
e-tAndol
ad
6e
Minimg disfenent kinds ef
tnteoctio-n sueg:
nouwledgee n dada hnce.
Reet
Diffesent usess
maybe intenexted in
knoueledae.Thenefoe t is difforenl knda
fer daba neca nds nof
oad 9tanae Rnole dg
discovoy tble.
miing tr cole
eactive mining ef knouwledge a Mutiple
dacta
levda d abiho
The miming oces needs to be
allows uses to tocus h e
Seandh fon
intesactiye
beenuse
and attens , potdim
nefinima datamining Heaesta baded em aetueyned uaw
Tncen pehatien backganund
Ruoled ae : To
Ghide
diocoveny poces and to
epned he discoveed patten
The backRgound Rnouole dq maybe
ed t omly in Comdse
teams bat alo at muttiple eve
absthae
Dataminima uey Lanunge and adhoe
datu minima
DMQL allows the uee to deacube adho e minimg ak
should be integneed eth dat nehosa u y langua ax
optimeed fon effieient and feai ble DM

Presentatien amd visuah zatien DM wks: Once he


pattm ahe discovened eeds to be 4pheed in hi
leve lavmauae amd visua nepnesentautim ehich ore sily
Undenstondabe

Hnding noisy imcomplebe data i he data cleavin


nolse aue incomplete
mehods coe nenuü'ned to hando te
te data Jie qulaties
ooje cks while minia
Patttnn evaluatien Tke patteoin discovened shouls be 'mtoti
e t t s eithen hey su
pent common fnouledqp e le
becaus e

novelty
E Penfermance Issues
Sealakilita fDM algonthms I
EHiiency nd SLelrb
ne
to fetivey enthact he infeemation fom hug
am ount datr im databaes, duteminin
alaoaith ms mat be ficrent and calable. -

Faial1el, Diatibuted and Lncaemental mining


algnithms These algoi thma divide i e daa on
Pactitiems which is tun phocaed im a paal
fahien. Then tRa usult hom pautitions is
merged The in cemental algsnu tAms upda
databases without minina4 ihe daa agaim
Aom 9cnatth

Divese DaTyaas Issues


Hand ing o elational and comple type of dih

The data beie mau Contain compler data tyses

ebjecs, mulmedia data o6jeck, spatial daka,


temponal Jota et It s net poksible f a ene sistem

to mine all kind data

Mbing infenmatim om hebengeMo us datebaus and


Loba infonmation Stkemv * The datu is avallable at
1tnenmt data Sounce em LAN AN Thee
be
datasone
may shutwjed, Semivttucked on
Mini ng he vntiuctued Theue
emusled ge Ho hem adds to ha
data minina
e
llenga t

3 Wht is he need n dota nepnsauina ? what au ie


methodsto nd le mois date and ezptkdin ?
I n neal wenld, data ie quite diuty It has heonskis4
noise und miss ing Valuas bec auce it i comeng **iom
tage set difé sment sounces ol data
uali tannet be poyided en t dala intesms Acuotacy
Completenes, cmsisEenc believab; ity ond
mteaphetability
No u a l i tthi've
data may lead to ènco nect neven ron
otte sotitia, no
maltta minina nesuB
Da Paeporocevaing
incudea sevenal taska
employed ?n he
Poae to make data meae 9iclavant te u s a3
algnitm tos
Data Pepaoatim e he phocaas e
omweting awdata nto
m
eamimaul dada
using d&ffexent echnlaues
Im paoves t e
4uality data n
datawereh oue In ou
makasRe minins aus uenay,
eajh po us amd al3 emoVes mi's
imconstent and in comp lete dcda
Noroy daa ie nothina but andorn emen eA Miana m a
meaund
Vasuabu Tehniqmes to SLmov e
mose ANe
1) Bnnng
j Reneion
lSRaing

)Binning
Binning: A is the tecmtaue whieh sents He datnvekuea
coneulting wi iB neighban vaknes
Let comsidu o, a)11,19, 20, 125,
22)aa
step Sot he deta:
a,1d, 3,8, 2 8 , 22,5, a3.
e Divide te data bm BiNS -buCR
b efa pmge valaes
1 iteneS 3 bins -bims 3
-

iems
BIN S 2 =
3
20 atott3/-1Lo
a5

aata5ta2b 5.
To S1emOve moise
e'n

BIN means calculate he meam oe

bn amd e p lae tfhe value


b mean
value

25 25 a5

)IN medians: oddn+1/2 even n/2


n,

ake a 6econd vae

BIN Bnndouic3 maaimum Valne 9 t A Values a


minimum vau neplaad by ih
cloaas t Valc

8
2
92 21

Reieasion DM technique which is used to f}


equahon to a dataset. an

U6ed fn PAedictd
Paediction, inea
nexesion btm-i
Valur

Tt is a data btn-
iting funchien .

clusting: Gmoups (c lusteu), ane formd Aom the da


having similan values.
The vae dont inchude in clatas anu outtieu
3 Out li ous

ste Data
cleantn pocoy man ual emots
De te crien dsae pancy
inconi fent datm

Taom.sfnmatin o data
Lets saus
&-Te
BE
Tech 1
Tanaf
B Tech BTeck
8 Teck
B T
BTech

The Follouwimq data Cin incheainn endr atnibue a


55,30,33,33
1, 16, 16, T6, 11,2.6, 20, 2, 22,DRJ D5,5,
35) 86 35,36, HO, 45, H6 52 96, Dse Spposthkeninb
SiN Meanu to smesth the above data using a Bin depth 3

Sot the dafa, Aa dat, is alheady soated we


qa fey
Step ene

&tep
the inthe en depth o 3- Afte that, 44
tep Divid ing
Smoo thening +Re albove data
we used to
use
qren

15 3+15+ 16/3 1 Go
1
16 19 26 16+11+20 3 13 33

5 a+a8+R5/3 4
30
30 5-B-20/ 3R6.6G
3 33 35 33+38 +35/333 6
B5
95 35 35+38+35/335
ro E 36 t+6+ 48/34o
6

cal culade the me an eea ch biny


ken all th BlN Yalue
3Heone e
meanaend ) The valkuea fte
ae eplated b het

1tt 66 466
15 3 13-83
1-8&

6 66 a6 6
38 66 33 6 6 .66
3 5

Lp
56

You might also like