yo ayo

© All Rights Reserved

0 views

yo ayo

© All Rights Reserved

- 4
- Enhancing Privacy of Confidential Data using K Anonymization
- Inference in Bayesian Networks
- Risk Prediction for Production of an Enterprise
- Sensor Management
- NanoDegree AI Syllabus
- Unit 4 Uncertain Knowledge Complete
- Dealing With Missing Data in Python Pandas
- decisoin.pdf
- Data Warehousing & Data Mining Jan 2014
- 5. Classification System
- 10.1.1.109.7891
- A Survey of Discretization Techniques Taxonomy and Empirical Analysis in Supervised Learning
- Ijirt101119 Paper
- C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lgorithm in WEKA T ool
- Civil Engineering and Urban Planning: An International Journal (CiVEJ )
- SRS Project
- The Design About Early Warning of Vegetable Diseas
- 08_ISSN_1392-1215_Data Mining for Managing Intrinsic Quality of Service in MPLS
- Bayesian Net Example

You are on page 1of 70

Sec 1. Classification

Unit 4 Algorithms: The Basic Methods

CLASSIFICATION

• Statistical Modeling

• Divide-and-Conquer: Constructing Decision Trees

• Covering algorithms: Constructing rules

• Mining association rules;

• Linear models;

• Instance-based learning;

• Clustering: Euclidean distance, Manhattan distance, nearest

neighbour, farthest neighbour, knn;

• Multi-instance learning.

Classification

• Also called supervised learning, inductive

learning learning by examples

• Given a set of data records, described by a set

of attributes A = {A1, A2, …, An}. Data set also

has a special target attribute C, which is called

the class attribute. The objective of

classification task is

– to relate values of attributes in A and classes in C

• Classification can also be called prediction

function The function can be used to predict

the class values/labels of the future data.

• The function is also called a classification

model, a predictive model or simply a

classifier

1. Inferring Rudimentary Rules

– We make rules that test a single attribute and

branch accordingly.

• Each branch corresponds to a different value of the

attribute.

• Use the class that occurs most often in the training data.

• Count the errors that occur on the training data—that is,

the number of instances that do not have the majority

class.

INFERRING RUDIMENTARY RULES

1R: example

1R: dealing with missing values and

numeric attributes

• Missing value,

– treated as a possible value for the attribute

• Numeric value,

– Converted to nominal using discretization

techniques

• Highly branching attributes do not perform well

on test examples;

– Example:

• an ID attribute that pinpoints instances uniquely

• each partition contains just one instance zero error rate

– This phenomenon is known as overfitting.

• Over fitted to the training set do not work well on test

set

– For 1R, overfitting is likely to occur whenever an

attribute has a large number of possible values.

• Solution?

– Apply constraint, e.g. each partition contains at least three

instances of the majority class

– Whenever adjacent partitions have the same majority class, as do

the first two partitions above, they can be merged together without

affecting the meaning of the rule sets.

Example

• Weather data: temperature attribute with numeric data

• Discretization steps:

– placing breakpoints wherever class changes

• Leading to a rule:

2. Statistical modeling

• 1R using one attribute as the basis of decision

• How about using all attributes as the basis of

decision?

– All attributes contribute to the decision

• Attributes are equally important

• Attributes are independent to one another

• A simple methods to be used based on

probability Bayesian Classification

Bayesian Classification

Konsep Dasar (1)

mengklasifikasikan data.

MERAH dan HIJAU.

Bayesian Classification

Konsep Dasar (2) – prior probability

• Jumlah HIJAU 2x dari jumlah MERAH, maka rasional jika

data baru cederung diklasifikasikan ke HIJAU karena

probabilitasnya lebih tinggi Prior Probability.

• Prior probability dari MERAH = jumlah item MERAH/jumlah semua item

• Prior probability dari MERAH = 20/60

Bayesian Classification

Konsep Dasar (3) – likelihood probability

kelompok kelas tertentu, maka item

tersebut cenderung diklasifikasikan

ke dalam kelas tersebut

Likelihood probability

Likelihood item diklasifikasikan HIJAU = jumlah item HIJAU di sekitar/jumlah total item HIJAU

Likelihood item diklasifikasikan RED = jumlah item MERAH di sekitar/jumlah total item MERAH

Likelihood item untuk diklasifikasikan dalam RED = 3/20

Bayesian Classification

Konsep Dasar (4) – posterior probability

• Metode Bayesian mengkombinasikan kedua sumber

informasi tersebut yang kemudian disebut sebagai posterior

probability untuk menentukan klasifikasi akhir.

• Posterior Prob untuk diklasifikasikan dalam MERAH = 2/6 x 3/20 = 1/20

• Kesimpulan:

Item yang baru di atas diklasifikasikan ke dalam MERAH karena

memiliki nilai posterior probability yang lebih besar

Implementation of Bayesian classification concept

• Prior probability,

– Prior prob of yes = 9/14

– Prior prob of no = 5/14

• Likelihood probability,

– Likelihood prob of yes = 2/9 x 3/9 x 3/9 x 3/9 =

– Likelihood prob of no = 3/5 x 1/5 x 4/5 x 3/5 =

Implementation of Bayesian classification concept

• Posterior probability

Rumusan Matematis

• Naïve Bayes Classifier: menentukan P(H|X) atau

posterior probability yang paling besar dari data

yang hendak diklasifikasikan.

… Xn dan ada sejumlah m kelas C1, C2, … Cm, maka:

untuk semua kelas, maka:

Rumusan Matematis

• Dalam Naïve Bayes, diasumsikan bahwa

atributnya saling independen.

• Rumusan akhir secara matematis:

dengan

Contoh

• Dataset untuk menentukan apakah seseorang akan

membeli komputer atau tidak berdasarkan atribut

umur, pemasukan, status pelajar, dan rating kredit.

• Klasifikasikan data dengan:

age income student credit_rating buys_comput

age <= 30 er

<=30 High No fair No

income = medium <=30 High No Excellent No

31 … 40 High No Fair Yes

student = yes >40 Medium No Fair Yes

>40 Low Yes Fair Yes

credit_rating = fair >40 Low Yes Excellent No

31 … 40 Low Yes Excellent Yes

Buys_computer=? <=30 Medium No Fair No

<=30 Low Yes Fair Yes

>40 Medium Yes Fair Yes

<=30 Medium Yes Excellent Yes

31 … 40 Medium No Excellent Yes

31 … 40 High Yes Fair Yes

>40 Medium No Excellent No

Contoh Implementasi age income student credit_ratin buys_comp

g uter

• Dimisalkan: <=30

<=30

High

High

No

No

fair

Excellent

No

No

31 … 40 High No Fair Yes

>40

Medium

Low

No

Yes

Fair

Fair

Yes

Yes

>40 Low Yes Excellent No

C2: buys_computer = no 31 … 40

<=30

Low

Medium

Yes

No

Excellent

Fair

Yes

No

<=30 Low Yes Fair Yes

• Hitung P(Ci) atau prior >40

<=30

Medium

Medium

Yes

Yes

Fair

Excellent

Yes

Yes

probability: 31 … 40

31 … 40

Medium

High

No

Yes

Excellent

Fair

Yes

Yes

>40 Medium No Excellent No

• P(buys_computer = yes)

= 9/14 = 0.643

• P(buys_computer = no) =

5/14 = 0.357

age income student credit_ratin buys_comp

g uter

<=30 High No fair No

<=30 High No Excellent No

31 … 40 High No Fair Yes

>40 Medium No Fair Yes

>40 Low Yes Fair Yes

>40 Low Yes Excellent No

31 … 40 Low Yes Excellent Yes

<=30 Medium No Fair No

<=30 Low Yes Fair Yes

>40 Medium Yes Fair Yes

<=30 Medium Yes Excellent Yes

31 … 40 Medium No Excellent Yes

31 … 40 High Yes Fair Yes

>40 Medium No Excellent No

•P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222

•P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6

•P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444

•P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4

•P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667

•P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2

•P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667

•P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

• Maka likelihood probability-nya adalah:

P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044

P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019

P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028

P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

buys_computer = yes” karena nilai posterior probability-nya yang lebih besar.

Menghindari Zero-Probability

• Klasifikasi atau prediksi dengan Naïve Bayes membutuhkan setiap

kondisi untuk memiliki probabilitas yang tidak nol (non-zero).

Contoh: Dataset dengan 1000 instances, salah satu atributnya,

income, memiliki 3 nilai, yaitu low, medium, dan high dengan jumlah

datanya untuk masing-masing nilai atribut yaitu:

Income = low adalah 0

Income = medium adalah 980

Income = high adalah 20

Dengan Laplacian correction, setiap kasus akan ditambah dengan

nilai 1, sehingga probabilitasnya menjadi:

Prob(income = low) = 1/1003

Prob(income = medium) = 981/1003

Prob(income = high) = 21/1003

Menghindari Zero-Probability

• Laplacian correction works well in practice, however, we could

instead choose a small constant μ and use:

Income = low adalah 0

Income = medium adalah 980

Income = high adalah 20

menjadi:

Prob(income = low) = 0+ (μ/3) /1000+μ

Prob(income = medium) = 980+(μ/3)/1000+ μ

Prob(income = high) = 20+ (μ/3)/1000+ μ

influential the a priori values.

Menghindari Zero-Probability

• Finally, there is no particular reason for dividing μ into

three equal parts in the numerators, instead we could

use:

• Effectively, these three numbers are a priori

probabilities of the values of the income attribute being

low, medium, and high, respectively.

Dealing with..

• Missing values

– No problem at all, the calculation would simply omit this

attribute

• Handling categorical

Example

• Thus,

– Posterior prob of yes = 2/9 x 0.0340 x 0.0221 x 3/9 x 9/14 = 0.000036

– Posterior prob of no = 3/5 x 0.0279 x 0.0381 x 3/5 x 5/14 = 0.000137

Solution for example 15

Kelebihan dan Kekurangan Naïve Bayes

Kelebihan:

• Mudah untuk diimplementasikan

• Untuk sebagian besar kasus diperoleh hasil yang bagus

Kekurangan:

• Harus menggunakan asumsi tidak ada hubungan antara

satu atribut dengan atribut yang lain, padahal prakteknya

terkadang ada data yang atributnya berkaitan. Masalah

ini diselesaikan dengan pengembangan dari Naïve

Bayes, yaitu Bayesian Belief Networks.

However, dependencies may exist..

• Bayesian (belief) network

– Graphical model of causal relationship

– Trained Bayesian network can be used for

classification.

– Two components:

• A directed acyclic graph

• A set of conditional probability tables (CPTs), each

variable has one CPT.

• Each node represents

random variable, which may

correspond to

– Attributes of D

– Hidden variables believed to

form a relationship

• CPT for a variable Y,

specifies the

conditional distribution

P(Y|Parents(Y)) .

Training Bayesian Network

• Network topology (layout of nodes and arcs)

– Given observable variables, several algorithms exist

for learning the network topology.

– Human expert in the field of analysis may help in

network design.

– Training: computing the CPT entries

– Gradient decent strategy Adaptive Probabilistic Network algorithm

3. Divide and conquer

Constructing Decision Tree

• Constructing a decision tree can be expressed

recursively.

– First, select an attribute to place at the root node,

and make one branch for each possible value.

• This splits up the example set into subsets, one for every

value of the attribute.

– Now the process can be repeated recursively for

each branch,

• using only those instances that actually reach the branch.

– If at any time all instances at a node have the same

classification, stop developing that part of the tree.

Which attribute to split on?

• We seek small trees, we would like this to

happen as soon as possible.

• We could choose the attribute that produces the

purest daughter nodes.

– The measure of purity is called information (the units

are called bits).

– Information associated with each node of the tree,

• it represents the expected amount of information that

would be needed to specify whether a new instance

should be classified yes or no, given that the example

reached that node.

• Number of yes and no classes at the leaf nodes are [2,

3], [4, 0], and [3, 2], respectively,

• information values of these nodes are:

• The root comprised nine yes and five no nodes,

corresponding to an information value of

info([9, 5]) = 0.940 bits

• Thus, Fig. 4.2(a) is responsible for an information gain

of

gain(outlook) = info([9, 5]) − info([2, 3], [4, 0], [3, 2]) = 0.940 − 0.693

= 0.247 bits

• which can be interpreted as the informational value of creating a

branch on the outlook attribute.

• Calculate the information gain for each attribute and split on the

one that gains the most information.

o gain(outlook) = 0.247 bits

o gain(temperature) = 0.029 bits

o gain(humidity) = 0.152 bits

o gain(windy) = 0.048 bits

• Therefore, we select outlook as the splitting attribute at the root

of the tree.

• Therefore, we select humidity as the

splitting attribute at this point. There is

no need to split these nodes any

further, so this branch is finished.

• The decision tree for weather data

How to calculate information?

• The best splitting attribute is the one that most

closely give each partition pure result.

– Defining the splitting attribute by impurity function

– The most popular impurity functions used for

decision tree learning are

• information gain and

• information gain ratio

– C4.5 algorithm uses information gain and

information gain ratio

Information Gain (1)

• The information gain measure is based on the

entropy (information value) function from

information theory

Information Gain (2)

Information Gain (3)

• We se the trend, data becomes purer, entropy

value becomes smaller.

• Thus, entropy measure the amount of impurity.

Information Gain (4)

• Then, we want to know which attribute can

reduce the impurity most if it is used to partition

D.

• To find out, every attribute is evaluated. Let the

number of possible values of the attribute Ai be

v. If we are going to use Ai to partition the data

D, we will divide D into v disjoint subsets D1,

D2, …, Dv. The entropy after the partition is

Information Gain (4)

• The information gain of attribute Ai is computed

with:

Answer of example 7.

Choosing splitting attribute

Answer of example 7.

Choosing splitting attribute

Gain Ratio (1)

• In a case where attribute has many possible values (an

extreme example, ID attribute)

– entropy value?

– Information gain?

• Gain ratio remedies this bias by normalizing the gain

using the entropy of the data with respect to the values

of the attribute. Our previous entropy computations are

done with respect to the class attribute:

Gain Ratio (2)

– Dj is the subset of data that has the jth value of Ai.

extend the tree

Quiz:

Define splitting attribute using a) information gain and b) gain ratio!

Evaluation (1)

• Many ways and many measures to evaluate

classifiers

– Accuracy

error rate = 1- accuracy

Evaluation (2)

• Several methods to evaluate classifiers

– Holdout set

•s

• test set is also called the holdout set

• mainly used when the data set D is large.

– Multiple Random Sampling

• When the available data set is small

• Perform random sampling n times.

– Each time a different training set and a different test set are

produced.

– This produces n accuracies.

– The final estimated accuracy on the data is the average of the n

accuracies.

Evaluation (3)

• Several methods to evaluate classifiers

– Cross-Validation

• When the data set is small, the n-fold cross-validation

method is very commonly used.

• available data is partitioned into n equal-size disjoint

subsets

– Each subset is then used as the test set and the remaining n-1

subsets are combined as the training set to learn a classifier.

– This procedure is then run n times, which gives n accuracies.

– Final estimated accuracy of learning from this data set is the

average of the n accuracies.

• 10-fold and 5-fold cross-validations are often used.

Evaluation (4)

• In some applications, we are only interested in

one class.

– The class that the user is interested in is commonly

called the positive class, and the rest negative

classes (the negative classes may be combined into

one negative class).

– Accuracy may not be a good measure (intrusion

example).

• Example:

99% of the cases are normal in an intrusion detection data set. Then a

classifier can achieve 99% accuracy without doing anything by simply

classifying every test case as “not intrusion”. This is, however,

useless.

Evaluation (4)

• Precision and Recall

– measure how precise and how complete the

classification is on the positive class.

• Confusion matrix:

4. Covering algorithm

• Take each class in turn and seek a way of

covering all instances in it, at the same time

excluding instances not in the class. This is

called a covering approach.

– At each stage we identify a rule that “covers” some

of the instances.

• By its very nature, this covering approach leads

to a set of rules rather than to a decision tree.

• If x > 1.2 then class = a

• However, the rule covers many b’s as well as a’s,

– so a new test is added to it by further splitting the space

horizontally as shown in the third diagram:

• If x > 1.2 and y > 2.6 then class = a

• This gives a rule covering all but one of the a’s.

– We could stop here, but if it were felt necessary to cover

the final a, another rule would be needed, perhaps

• If x > 1.4 and y < 2.4 then class = a

A simple covering algorithm

• Divide-and-conquer algorithms choose an

attribute to maximize information gain.

• Covering algorithm chooses an attribute–value

pair to maximize probability of the desired

classification.

– To include as many instances of the desired class as

possible and exclude as many instances of other

classes as possible.

• Suppose the new rule will cover a total of t instances

– p are positive examples

– Thus, t – p are other classes

Then choose the new term to maximize the ratio p/t.

A simple covering algorithm:

example

• We will form rules that cover each of the three

classes—hard, soft, and none—in turn. To

begin, we seek a rule:

• If ? then recommendation = hard

– For the unknown term ?, we have nine choices:

» age = young 2/8

» age = pre-presbyopic 1/8

» age = presbyopic 1/8

» spectacle prescription = myope 3/12

» spectacle prescription = hypermetrope 1/12

» astigmatism = no 0/12

» astigmatism = yes 4/12

» tear production rate = reduced 0/12

» tear production rate = normal 4/12

A simple covering algorithm:

example

• If astigmatism = yes then recommendation = hard

• This rule is quite inaccurate, getting only 4

instances correct out of the 12 that it covers. So

we refine it further:

• If astigmatism = yes AND ? then recommendation = hard

» age = young 2/4

» age = pre-presbyopic 1/4

» age = presbyopic 1/4

» spectacle prescription = myope 3/6

» spectacle prescription = hypermetrope 1/6

» tear production rate = reduced 0/6

» tear production rate = normal 4/6

THEN recommendation = hard

A simple covering algorithm:

example

– Produced rule only covers 3 out of the 4 hard

recommendations.

• So, we delete these 3 from the set of instances and start

again, looking for another rule.

– Then, do the same process for class soft and none.

• What we have just described is the PRISM

method for constructing rules.

- 4Uploaded byGhulam Abbass
- Enhancing Privacy of Confidential Data using K AnonymizationUploaded byIDES
- Inference in Bayesian NetworksUploaded byAbin Paul
- Risk Prediction for Production of an EnterpriseUploaded byATS
- Sensor ManagementUploaded bywardof
- NanoDegree AI SyllabusUploaded bySubash Basnyat
- Unit 4 Uncertain Knowledge CompleteUploaded bysssttt1993
- Dealing With Missing Data in Python PandasUploaded bySello
- decisoin.pdfUploaded byAnil
- Data Warehousing & Data Mining Jan 2014Uploaded byPrasad C M
- 5. Classification SystemUploaded byRohit shahi
- 10.1.1.109.7891Uploaded bySyarifah Nuzul
- A Survey of Discretization Techniques Taxonomy and Empirical Analysis in Supervised LearningUploaded byProfessora Jaqueline Faria
- Ijirt101119 PaperUploaded byabc
- C omparative S tudy of D iabetic P atient D ata’s U sing C lassification A lgorithm in WEKA T oolUploaded byATS
- Civil Engineering and Urban Planning: An International Journal (CiVEJ )Uploaded byMatthew Johnson
- SRS ProjectUploaded byNayani Gadekar
- The Design About Early Warning of Vegetable DiseasUploaded byabdul
- 08_ISSN_1392-1215_Data Mining for Managing Intrinsic Quality of Service in MPLSUploaded byYonathan Evan
- Bayesian Net ExampleUploaded bynour
- Decision trees and random forestsUploaded byVijayalakshmi M
- Study on Automatic Age Estimation and Restoration for Verification of Human FacesUploaded byIJSTA
- MB0050 set-2Uploaded byMansi Patel
- Landenberger_2Uploaded byNawazul Huda
- Sem4_JacobiUploaded bySamuel Defilippo
- COMPUSOFT, 2(12), 391-395.pdfUploaded byIjact Editor
- Cruz Ijcnn 2015Uploaded bykakarotto
- 2085-8315-1-PBUploaded bymoralus
- data structureUploaded byAhmed Khan
- AKTU_UIIC_QP_S1_P01Uploaded byLalit pratap singh

- acadintegUploaded byBenson Luo
- Executive Administrative Assistant in Stamford CT Resume Barbara FenglerUploaded byBarbaraFengler
- GTUploaded byadrimir
- KSSM English Form 1 Teaching Organiser Guide (Handout Version)Uploaded byAdlina
- Multipoint Workstation - MWS8840Uploaded byaref12345
- Research Methods for Business Students - SummaryUploaded byMahmoud Abu Shamleh
- Ahmed 11Uploaded bymazaha261518
- Icwa Rules, 2006Uploaded byLatest Laws Team
- Itp TOEFL Self-study ResourcesUploaded bySarah Mahdy
- Measuring university service quality by means of SERVQUAL methodUploaded bynurul
- Fracturas de Humero DistalUploaded byCamilo Andrés Orduña
- 7-2-TSRAPOCKETMENTORUploaded byGaetano Di Giovanni
- Ajay RanaUploaded byapi-3812391
- MasterMind 2nd Edition Level 1 SB Unit 4 Sample1Uploaded byBrandonLm
- Cultural DiversityUploaded byNinsuda Dowsuk
- CLASE 4 2013-01!05!15!09!40 TOEFL IBT Online Course BenefitsUploaded byJuan Sebastian Torres
- jackson-tpges-lesson-planUploaded byapi-320999303
- 4200 JTAG SeriesUploaded byHung Bushido
- Islamiyat_Classes XI-XII _NC 2002_Latest Revision June 2012Uploaded byAttique Rehman
- singingUploaded bydinh_sami5745
- P - AIM - 034Uploaded byPutri
- Identifying Various Aspects of Life through Houses and Planets.rtfUploaded byfarzana25
- 2011 North America Epicor User Summit MFG_PRINT_021811Uploaded byFaith Sera Chebet
- Childline Vijayawada a Case Study Ch 13Uploaded byZeeshan Islam
- Literary TheoryUploaded byerasmen
- The MatchUploaded byNici Anggraini II
- 6cb5da3513bd26085ee3fad631ebb37a-original.pdfUploaded byPriyaKumar
- Ice BrakersUploaded byANTONOVA
- Guided WritingUploaded byCLebubb Fandiy Oishii
- e3 Writing Report 2009Uploaded bymarrrian