You are on page 1of 4

NORMALIZATION EXAMPLE

Assume that we have a customer order form. At the top of the forms is the order number
(unique with each order), customer identification information (name, id number, address, etc.),
and the date of the order. Below this heading are lines of information for the items ordered,
one line per item; each line is called a line item. For each line item we have information
describing the item ordered (part number, description, cost, etc.) At the bottom of the order is a
charge for freight, the tax, and the order total.

A data dictionary definition of this order form might look like the following:

customer-order =
order-number &
customer-number &
customer-name &
customer-address &
order-date &
n
1{line-item} =
part-number &
description &
unit-price &
quantity &
item-total-price &
freight-charge &
sales-tax &
order-total

Figure 1

The first step in the normalization process is to put the structure into first normal form (1NF). A
set of structures is in first normal form if no structure contains a repeating sub-structure. This
means that we must remove line-item from customer-order. However, we also must be able to
reassemble a customer order to print it, so it must be possible to associate a line-item record
with a particular customer-order record. Therefore, we include the order-number in the new
structure. The result is shown in figure 2 on the following page. Of course, every structure
must have a key which uniquely each occurrence of the structure. To emphasize this, we
indicate the key field(s) in each structure by underlining it/them. Note that in line-item the key
consists of two fields, order-number and part-number.

The 1NF structures are more flexible than the original single structure- For one thing, we don’t
have to worry about how much space to allocate for a line-item table when we implement the
system. More important at this stage, however, is the fact that the 1NF structure shows the

-1-
446526672.doc
customer-order =
order-number &
customer-number &
customer-name &
customer-address &
order-date &
freight-charge &
sales-tax &
order-total

line-item =
order-number &
part-number &
description &
unit-price &
quantity &
item-total-price

Figure 2

logical organization of the data better than the un-normalized structure. We see that a line-
item is a data component in its own right, rather than simply being a component of customer-
order.

In spite of these advantages, first normal form still has some weaknesses. First, consider line-
item. Quantity and item-total-price require knowing both order-number and part-number to
determine their values. I.e., they are characteristics of a particular line-item. On the other
hand, description and unit-price depend only on the part-number. Second, in customer-order
you can determine customer-name and customer-address by knowing only the customer-
number, which is not even part of the structure’s key!

In general, we say that one field is functionally dependent on a second field if you can
determine the value of the first field knowing only the value of the second field. For example,
customer-number is functionally dependent on order-number. A field is fully functionally
dependent on a set of fields if you must know the value of every field in the set to determine
the value of the dependent field. For example, in line-item, quantity is fully functionally
dependent on order-number and part-number. Finally, we say that a set of structures are in
second normal form (2NF) if

i. they are in first normal form;


ii. in each structure, all non-key fields are fully functionally dependent on the
complete key.

To place the customer order in second normal form, we must now define it using three
structures (see figure 3). Customer-order is unchanged, but in line-item all fields that are
specific to a particular part have been moved to a new structure. Note the advantage of this
organization. If, for example, we need to change the description of a part, we simply change

-2-
446526672.doc
the appropriate occurrence of part-record and all customer orders will reflect the new
description.

customer-order =
order-number &
customer-number &
customer-name &
customer-address &
order-date &
freight-charge &
sales-tax &
order-total

line-item = part-record =
order-number & part-number &
part-number & part-description &
quantity & unit-price
item-total-price

Figure 3

There is still one more revision we should make to the definition of a customer order. Recall
that on page 2 we pointed out that in customer-order you can determine the value of customer-
name and customer-address without knowing the value of the key field. We can eliminate this
dependency by defining a new structure which we will call customer-record. The resulting set
of structures is shown in figure 4.

The set of structures in figure 4 is in third normal form. A set of structures is in third normal
form if

i. the structures are in second normal form;


ii. no non-key field is functionally dependent on any other non-key field.

We should point out that, at least for purposes of analysis, the 3NF reduction should be
applied even if the “key” is not actually present in the structure. For example, consider the
structure customer-order as shown in figure 3. Even if customer-number were not actually
present in the structure, customer-name and customer-address are still not dependent on
order-number; any order for this customer would have the same name and address.
Therefore, the set of structures shown in figure 4 should still be produced even if it means
adding customer-number (the key to customer-record) to customer-order.

We actually can make still further simplifications. For example, in line-item the item-total-price
is simply the product of quantity and unit-price. Similarly, in customer-order the freight-charge

-3-
446526672.doc
customer-order = line-item =
order-number & order-number &
customer-number & part-number &
order-date & quantity &
freight-charge & item-total-price
sales-tax &
order-total

part-record = customer-record =
part-number & customer-number &
part-description & customer-name &
unit-price customer-address

Figure 4

and sales-tax fields are probably a function of the order-total (minus, of course, the freight-
charge and sales-tax amounts), which in turn is the sum of the item-total-price for all line-items
in the order. In other words, we could simply eliminate these fields from the structures and tell
how they are calculated in the process specifications. Nonetheless, we will stop the
normalization process at this point, for two reasons. First, as part of the systems analysis
process our primary interest is in studying the logical structure of the data, not building a
database. Second, one can argue that we want the cost factors to be frozen when the order is
placed regardless of any future changes in item prices. While the data files can be designed
with multiple prices keyed to effective dates, this adds a complication which we choose not to
create in this example. (It should be pointed out, however, that many real-world systems would
require this multiple-price capability.)

Notice the advantage of this organization. The basic components of a customer order are
clearly visible rather than being buried in a single structure. In addition, we can see that part-
record and customer-record are quite likely to use data stores that exist for other components
of the system. For example, we may be keeping extensive information about customers in a
customer data store. The normalized organization shows clearly where we can get the
customer-related information needed to print a customer order.

-4-
446526672.doc

You might also like