Database Schema Design

The goal is to design a schema that models the problem domain as precisely as possible, such that both current known and future unknown use cases may employ the database. A possible approach is to go through the requirements and through the texts of all the use cases and put the nouns in a dictionary. Nouns that do something, i.e., are related to a verb, are candidates for database tables. Nouns that have no verb are possible attributes (table columns). For example, consider the use case text: “The item is given an identification number.” We have two nouns: item and identification number. “Item”—has the verb “is given” and is a candidate for a database table. “Identification number”—no related verb, it will probably be a column in the “Item” table. Database constraints can be defined over a schema. The constraints are usually on the contents of database tables. When the user tries to change the contents of a table, the relevant constraints are checked first. If there is a constraint violation—the database engine raises an exception and does not allow the update. The list of some common constraints follows: PRIMARY KEY a column, where all values are unique. Example is the table Person(id, firstName, lastName, dateOfBirth, address). The primary key “id” is underlined. A table cannot have two entries with equal primary keys. Primary key may be a column combination, for example: Chair(model, color, weight, price). A certain model may exist in different colors. A better solution, however, is Chair(id, model, color, weight, price). FOREIGN KEY is a column (or several columns), whose values are limited to to the values of certain column(s) of another table, called the referenced table. For example: consider a table Pair(pairId, personId1, personId2). The values in “personId1” and “personId2” columns denote the Id’s of two persons and we want these values to originate from the “Person” table. We do not want to have a person Id in the “Pair” table, that does not appear in the “Person” table. Foreign key column(s) must reference column(s), that have a PRIMARY KEY or UNIQUE constraint (see below). UNIQUE column(s) are columns whose values are unique. The difference from the primary key is that a table can have several UNIQUE constraints but at most one PRIMARY KEY constraint. Consider the table Supplier(id, name, address). The “id” column is a primary key, but we can also define a UNIQUE constraint over (name, address). By that we emphasize the fact, that there are no two suppliers, that have the same name and address. CHECK constraint is a user-defined boolean expression, that must be true at all times. For example, in a table Book(bookid, . . . , price) we may define a CHECK (price > 0). Normalization The columns of the tables may be either key or non-key columns. In a normalized schema a non-key column appears only once, in one table in the database. This is a special case of the software engineering rule, saying that a piece of data or code shall appear only once in the entire program. The motivation for this recommendation is that when this rule is followed, an update has only to be done in one place. If the rule is not followed, an update may have to be done in a number of places. It is not the amount of work of updating in a number of different places that bothers, but the possibility that one forgets to update in one of the several required places.