You are on page 1of 2

Surrogate keys

Surrogate keys join the dimension tables to the fact table.

Surrogate keys serve as an important means of identifying
each instance or entity inside of a dimension table.
Reasons for using surrogate keys
Data tables in various OLTP source systems may use
different keys for the same entity
It may also be possible that a single key is being used by
different instances of the same entity. This means that
different customers might be represented using the same key
across different OLTP systems.
This can be a major problem when trying to consolidate
information from various source systems. Or for companies
trying to create/modify data warehouses after mergers and
acquisitions. Existing systems that provide historical data
might have used a different numbering system than a current
OLTP system. Moreover, systems developed independently may
not use the same keys, or they may use keys that conflict with
data in the systems of other divisions. This situation may not
cause problems when each department independently reports
summary data, but can when trying to achieve an enterprisewide view of the data.
This means that we cannot rely on using the natural primary
keys of the source system as dimension primary keys because
there is no guarantee that the natural primary keys will be
unique for each instance. A surrogate key uniquely identifies
each entity in the dimension table, regardless of its natural
source key. This is primarily because a surrogate key
generates a simple integer value for every new entity.
Surrogate keys provide the means to maintain data
warehouse information when dimensions change
Surrogate keys are necessary to handle changes in dimension
table attributes.
Natural OLTP system keys may change or be reused in
the source data systems
Some systems have reuse keys belonging to obsolete data or
for data that has been purged. However, the key may still be in
use in historical data in the data warehouse, and the same key
cannot be used to identify different entities.
The design, implementation, and administration of surrogate
keys is the responsibility of the data warehouse team.
Surrogate keys are maintained in the data preparation area
during the data transformation process.

Improve performance of queries

The narrow integer surrogate keys mean a thinner fact table.
The thinner the fact table, the better the performance.
Handle exception cases
If requirements need to be determined or do not apply, use a
surrogate key.
Changes or realignment of measures should be carried
in a separate column in the table
If data must be reviewed or summarized, regardless of the
number of times it appears in a table, you should use that
column as part of the surrogate key.
Avoid globally unique identifiers as surrogate keys
Globally unique identifiers are known to work well in the
source OLTP systems, but they are difficult to use when it
comes to data warehouses. This is primarily because of two
Globally unique identifiers use a significant amount of space
compared to their integer counterparts. Globally unique
identifiers take about 16 bytes each, where an integer takes
about 4 bytes.
Indexes on globally unique identifiers columns are relatively
slower than indexes on integer keys because globally unique
identifiers are four times larger.

What is surrogate key? Explain it with an example.

AnswerData warehouses commonly use a surrogate key to uniquely identify
an entity. A surrogate is not generated by the user but by the system. A
primary difference between a primary key and surrogate key in few databases
is that PK uniquely identifies a record while a SK uniquely identifies an entity.
E.g. an employee may be recruited before the year 2000 while another
employee with the same name may be recruited after the year 2000. Here,
the primary key will uniquely identify the record while the surrogate key will be
generated by the system (say a serial number) since the SK is NOT derived
from the data.