You are on page 1of 3

First Rule/Major Rule

Using first rule, all we do is choose a single attribute in the data set and map each of it’s
possible values to the outcome that most frequently accompanies it. More formally we choose
the outcome that has the lowest ​error rate​.

Definition: Suppose an attribute, A, has n possible values, {a​1​, a​2​, …, a​n​}, and there are
m possible outcomes, {b​1​, b​2​, …, b​m​}. The error rate of a mapping (a​i​, b​j​) is as follows:

Note: ​Here (a​i​, b​j​) means all records where A=a​i​ and the outcome is b​j​, while (a​i​, *) means all
records where A=a​i​. In plain English, this definition means that we divide the number of times a​i
occurs but b​j​ does not occur by the total number of times a​i​ occurs.

We call the mapping from an attribute value to the outcome that has the lowest error rate, the
optimal mapping​.

Definition: The optimal mapping of attribute A is the mapping

Finally, if we have the optimal mapping for an attribute we can define the error rate for the
attribute itself:

Definition: Given an attribute, A, and it’s optimal mapping, the error rate of the attribute is
the following:

Example

Consider the following data set:

Outlook Temperature Humidity Windy Should we go


out and play?

Sunny Hot High True No

Sunny Hot High True No

Cloudy Hot High False Yes


Rainy Mild High False Yes

Rainy Cool Normal False Yes

Rainy Cool Normal True No

Cloudy Cool Normal True Yes

Sunny Mild High False No

Sunny Cool Normal False Yes

Rainy Mild Normal False Yes

Sunny Mild Normal True Yes

Cloudy Mild High True Yes

Cloudy Hot Normal False Yes

Rainy Mild High True No

To start, lets look at a single value of a single variable: Outlook=Sunny


There are 5 records where Outlook is Sunny, 2 of them have an outcome of Yes and 3 of them
have on outlook of No. Therefore we have

Therefore the optimal mapping for this attribute value is


Calculating error rates for the rest of the possible values, we get the following:

From this we get the following optimal mapping table:

Outlook Outcome

Sunny No
Rainy Yes

Cloudy Yes

And following the same procedure, we find the following optimal mappings for the other
attributes

Temperatur Outcome
e

Hot Yes/No*

Mild Yes

Cool Yes

Humidity Outcome

High No

Normal Yes

Windy Outcome

True Yes

False Yes/No*

*​Note:​ In these cases, the error rates were the same for both outcomes. When this happens the
choice is arbitrary and either fits the definition of an optimal mapping.

If we want to figure out which attribute would be the best to use, we can use these optimal
mappings to calculate the error rates for each attribute:

So either Outlook or Humidity would be equally good choices.

You might also like