Professional Documents
Culture Documents
Using first rule, all we do is choose a single attribute in the data set and map each of it’s
possible values to the outcome that most frequently accompanies it. More formally we choose
the outcome that has the lowest error rate.
Definition: Suppose an attribute, A, has n possible values, {a1, a2, …, an}, and there are
m possible outcomes, {b1, b2, …, bm}. The error rate of a mapping (ai, bj) is as follows:
Note: Here (ai, bj) means all records where A=ai and the outcome is bj, while (ai, *) means all
records where A=ai. In plain English, this definition means that we divide the number of times ai
occurs but bj does not occur by the total number of times ai occurs.
We call the mapping from an attribute value to the outcome that has the lowest error rate, the
optimal mapping.
Finally, if we have the optimal mapping for an attribute we can define the error rate for the
attribute itself:
Definition: Given an attribute, A, and it’s optimal mapping, the error rate of the attribute is
the following:
Example
Outlook Outcome
Sunny No
Rainy Yes
Cloudy Yes
And following the same procedure, we find the following optimal mappings for the other
attributes
Temperatur Outcome
e
Hot Yes/No*
Mild Yes
Cool Yes
Humidity Outcome
High No
Normal Yes
Windy Outcome
True Yes
False Yes/No*
*Note: In these cases, the error rates were the same for both outcomes. When this happens the
choice is arbitrary and either fits the definition of an optimal mapping.
If we want to figure out which attribute would be the best to use, we can use these optimal
mappings to calculate the error rates for each attribute: