You are on page 1of 6

Data Modeling and Implementation Surrogate Keys Pg 1

Surrogate Keys
Ray Lockwood

Points:
 A Surrogate Key is the primary key and links tables together.
 A Surrogate Key is usually a system generated integer.
 A Surrogate Key doesn't contain a fact about the object being modeled.
 A Primary Key shouldn't change when the facts about an object change.
 A Surrogate Key increases a database’s referential integrity.

Keys Link Tables Together


A example of a
physical link is
Keys are the mechanism that link tables together. Because a table is a logical structure, its the memory
keys are logical, too. A key is a logical link between tables. Since a key is a logical pointer of the
network data-
device, we have some freedom in choosing its values. base.

Natural Keys
A Natural Key is
not only a linking
In a database of country songs, we could choose AlbumName as a primary key: device, it also
contains a fact
about an object.
Primary Key
Album:
A Natural Key is
AlbumName Artist YearReleased AlbumName is a Natural a consequence
Cowboy Country Billie Nelson 1989 Key because this attribute of the table’s
occurs “naturally” in the data.
Cryin’ Bob Bob Smittie 1984
table.
Foreign Key AlbumName is a
natural key
Song: because it
contains data
AlbumName SongTitle Length about the object.
Cowboy Country Country Shuffle 3:14
Cowboy Country Country Blues 3:05
Song is an
Cryin’ Bob Cryin’ Over My Pickup Truck 3:12 ID dependent
Cryin’ Bob Cryin’ Over My Dog 3:08 entity.

Cryin’ Bob Cryin’ Over My Beer 3:10

Album Song

In choosing AlbumName as the primary key of Album, we're using a piece of data about
the entity as the linking construct. This is a Natural Key.

COP4710
Data Modeling and Implementation Surrogate Keys Pg 2

Since we have
Problem with Natural Keys to accept natural
keys “as they
The problem with using an object's data as a key is two-fold: are”, we don’t
have control
over them.
1. What happens if we need to change the value of a natural key, perhaps to correct a
misspelling? We'd have to change all the foreign keys, too. (Foreign keys produce
a form of redundancy). Cascading
Deletes: when a
2. What happens if the natural key turns out to be non-unique? parent row is
deleted, all its
children in other
According to Relational rules, a key value can never change, so a misspelling can't be tables are
deleted, too.
corrected. However, most DBMS's support key value changes through a feature called
cascading updates: all foreign key values in child rows are automatically changed if the
primary key value in the parent is changed, so links are undisturbed. Cascading
Updates: when
the primary key
I ran into the uniqueness problem in a table of manufacturer’s part numbers. We used value in a parent
row is changed,
MfrPartNum as the primary key in this table. After it had been used for a couple of years, foreign keys in
the database wouldn't let a clerk enter a particular new part. This part happened to have the children are
changed to
the same part number as a different manufacturer’s unrelated part already in the database. match.

The key we chose wasn't under our control. We relied on outside actors to insure Relying on keys
uniqueness of our keys, and that caused us problems. we have no
control over to
be unique and
unchanging is
risky.
Referential Integrity and Foreign Key Constraint
Referential
means linking
Referential Integrity means assuring that the connections between tables remain valid. tables together
One important type of rule for referential integrity is called the Foreign Key Constraint. using keys.

Foreign Key Constraint means that values in the foreign key column of a child table will Referential
Integrity means
always contain only values from the primary or candidate key attributes in the parent assuring that
table. There must be no dangling pointers; this means there must be no row with a linkages
between tables
foreign key value that points to a non-existent primary key value in another table. In are always
simpler terms, there will be no orphan rows in child tables. correct.

In our Country Music Database, if the key of a row of the Album table is changed, it must A Foreign Key
Constraint
also be changed for corresponding rows in the Song table, too. This weakens referential prevents orphan
integrity. rows in child
tables.

COP4710
Data Modeling and Implementation Surrogate Keys Pg 3

Surrogate Keys Keys can be


pure linking
devices, and not
To avoid referential integrity problems, it’s better for a key to be strictly a linking contain a fact
about an object.
mechanism; its better if a key doesn't contain data about the object modeled by its table.
In other words, keys shouldn't contain a fact about a real-world object.
A surrogate is a
A key that doesn't contain real-world data is called a Surrogate Key (also called an substitute for
something else.
Artificial Key as opposed to a Natural Key). A surrogate key is an alias for an object.

If we can’t put real-world data into a key, what is the alternative? We use a system- Surrogate keys
generated integer (usually an auto-increment integer) to set a value for our surrogate key. are system
generated.

A Surrogate Key Becomes a “Fact” Outside Its Native Database


Surrogate keys abound: Your social security num, driver’s license num, phone num, A surrogate key
is a surrogate
automobile serial num, and Winn-Dixie reward num. These are system-generated only within the
identifiers that happen to be visible to users of the system. database which
created it.

They're surrogate keys only within the database in which they were generated. Outside
If a surrogate
that database, they're a fact about an entity. Within the Social Security Administration key is visible
database, your SSN is a surrogate key under the control of the database owner. In the rest outside the
database, it
of the world your SSN is a fact about you. becomes a fact
about an object.
Most surrogate keys are generated for the DBMS to use internally and aren't visible
outside the database. In our Country Music database, instead of using Song Title, we can Surrogate keys
add an auto-increment integer column as a surrogate primary key. Tables linked to Album are almost
always unseen
will use this surrogate key for linking, but this value has no meaning outside the database. by the outside
world.
Surrogate Keys Improve Referential Integrity
A key is a linking construct; a surrogate key is a pure linking construct. The advantage
of a surrogate key is that it's under the control of the database designer and not the A surrogate key
outside world. Social Security number is a surrogate key assigned by the Social Security is under the
control of the
Administration for their own use. database owner.

A person might change his name, address, occupation, and correct the date of his birth;
but his SSN will never change unless the people who own the SSN database decide to
change it. As a result, this surrogate key a more reliable way for the Social Security folks
to link their tables together than if they chose a natural key.

Surrogate Keys Improve Efficiency


The data type for surrogate keys is chosen for efficiency, and the integer type is the most A surrogate key
efficient to search. This fact, plus the fact they can be auto-incremented, means that is almost always
an auto-incre-
surrogate keys are generally integers. mented integer.

COP4710
Data Modeling and Implementation Surrogate Keys Pg 4

Example

Here are our Country Music database tables again:

Album:
AlbumName Artist YearReleased
Cowboy Country Billie Nelson 1989
Cryin’ Bob Bob Smittie 1984

Song:
AlbumName SongTitle Length
Cowboy Country Country Shuffle 3:14
Cowboy Country Country Blues 3:05
Cryin’ Bob Cryin’ Over My Pickup Truck 3:12
Cryin’ Bob Cryin’ Over My Dog 3:08
Cryin’ Bob Cryin’ Over My Beer 3:10

To change the name of the album “Cowboy Country” to “Kowboy Kountry”, then in
addition to the Name attribute, we'd have to change all the corresponding foreign key
values in the Song table.

Here are the same two tables using the surrogate key “AlbumID” as the primary key of
Album. The value of AlbumID is a system generated auto-increment integer:
Surrogate Primary Key
Just another attribute
Album: Now, the “Name”
AlbumID Name Artist YearReleased attribute is just
another fact
001 Cowboy Country Billie Nelson 1989 about an object,
002 Cryin’ Bob Bob Smittie 1984 and not a linking
device.

Song:
AlbumID SongTitle Length
001 Country Shuffle 3:14
001 Country Blues 3:05
002 Cryin’ Over My Pickup Truck 3:12
002 Cryin’ Over My Dog 3:08
002 Cryin’ Over My Beer 3:10
A surrogate key
is a pure linking
Now we can change the album name in only one place, the Album table, without construct – that’s
disturbing the linkage between tables. The surrogate key is a pure linking construct. It all it does.

carries no fact about an entity – it has the single purpose of linking rows of one table to
those of another. This is more robust and efficient than using a natural primary key.

COP4710
Data Modeling and Implementation Surrogate Keys Pg 5

Disadvantage
In our original Country Music Database, we could look at the Song table alone and read
album names that went with song titles. With the surrogate primary key, we can't do this. A disadvantage
We have to perform a join to see the album names that go with song titles. of surrogate keys
is that more joins
may be neces-
Weigh the advantage against this disadvantage before deciding to use a surrogate key. sary to get infor-
mation from child
tables.

A Note about Normalization


Data repeated in
Data that's duplicated in foreign keys is a type of data redundancy – that means repeated foreign keys
reflects the
data that's hard to maintain if the data changes. Because keys are never meant to change, relationship
this is not a pathological redundancy addressed by normalization. It's a reflection that the between tables.
rows in the child table belong to the rows in the parent table.

Disadvantages of Surrogate Keys

Disadvantage 1
Surrogate key values have no meaning to the user. They don’t have much meaning to the Surrogate keys
database designer either. They make tables hard to read during casual inspection. make data look
more cryptic on
Surrogate keys make your data more cryptic, and a step away from the Relational goal of casual
making the database closer to information than to data. inspection.

As a result, you have to join tables to look up the facts the surrogate keys represent. This Tables linked by
surrogate keys
is a performance hit if an application has to join tables where it wouldn’t have if we used may require
natural keys. more joins.

Disadvantage 2
When data is shared among different databases, the uniqueness of surrogate keys is not
It is potentially
assured. If you have two independent databases and wish them read each other’s data, it’s more difficult for
likely that surrogate key values of one database will be duplicated in the other. This can different
databases to
cause meaningless connections between the same kinds of data in the different databases. share data.

This was the problem in the parts inventory database earlier. Two unrelated parts from
Databases in-
different manufacturers happened to have the same part number. These part numbers, tended to share
though natural keys in our database, are really surrogate keys in the manufacturers’ data should
draw surrogate
databases. When combined, a collision occurred. keys from dif-
ferent pools.
If you design databases that might one day be combined, you can plan ahead and assign
different numeric ranges for the surrogate keys.

Some database designers use GUID or UUID values generated by the operating system as
surrogate keys. Globally Unique ID
Universally Unique ID

COP4710
Data Modeling and Implementation Surrogate Keys Pg 6

Surrogate Keys Are Widely Used


Using surrogate
The advantages of using surrogate keys usually outweigh their disadvantages. Most keys is a good
database designers use surrogate primary keys for many and sometimes all the table practice.

If a good natural key exists, use it. If not, use a surrogate key.

Summary

Surrogate keys:
 Don't introduce normalization problems in a table.
 Improve referential integrity because they're pure linking constructs.
 Improve join performance.
But:
 Make tables more cryptic when casually viewing them.
 May require more joins for certain queries.
 Can cause problems when connecting different databases together.

COP4710

You might also like