You are on page 1of 11

9/10/12 Articles

The Cost of GUIDs as Primary Keys


By Jimmy Nilsson

Date: Mar 8, 2002

Article is provided courtesy of Sams.

Return to the article

In this article, Jimmy Nilsson presents the pros and cons of using globally unique identifiers (GUIDs) as
the datatype for primary keys in SQL Server 2000. In doing so, he shows you test results that hint of
performance characteristics and introduces you to a special type of GUID that he invented, called
COMBs, that solves what otherwise might give you a big throughput problem.

In this article, I will be discussing the pros and cons of using globally unique identifiers (GUIDs) as the
datatype for primary keys in SQL Server 2000. You will see concrete test results that give a hint of the
performance characteristics. Along the way, I will also discuss a special type of GUID that I "invented,"
called COMBs. Before we start discussing GUIDs, though, I'd like to say a few words about natural and
surrogate keys.

Natural or Surrogate Keys


When you do the physical design of a relational database, it's very important to decide upon which style
to use for the primary keys. Some people prefer to use natural keys—that is, keys that are found in the
domain that the database models. Others prefer to use surrogate keys, which are constructed keys with
no other purpose than to be just keys (and which are not found in the domain). An example of a natural
key is a Social Security number. A value incrementing by 1 for each row is a typical example of a
surrogate key.

Using natural keys is the traditional approach, in line with Codd's original relational model. When you
use them, you have only natural data that means something to users. This is good if users will ask ad
hoc queries directly to the database in raw SQL. You can also often reduce the numbers of joins when
using natural keys because you don't have to go to a lookup table to convert an ID to a description.
One more advantage is that you get the minimum number of constraints because you don't have to
protect the uniqueness of the natural keys separately. You already did this when you used them as
primary keys.

Surrogate keys can be seen as a newer approach. This approach does not conflict with the relational
model, but, in a way, it is a step closer to a more object‑based approach in which each object has an ID
and the structure of all IDs is of the same type. When you use surrogate keys, you often get smaller
foreign keys, which reduces the size of the database. There is no risk of users changing the values of
the primary keys, and the programming can be more consistent because all keys are of the same
format.

NOTE

With cascading updates/deletes in SQL Server 2000, the problem of users changing the values of
primary keys is not so great anymore because you don't have to program the UPDATEof dependent rows
manually.

That was a brief description of the different kinds of keys. Now let's assume that we choose to use
surrogate keys when we design a new database.

A New Database
We are about to build a new database for order processing. A greatly simplified version of the database
will look as shown in Figure 1.

Figure 1 Database schema for a simplified database for order processing.


www.informit.com/articles/printerfriendly.aspx?p=25862 1/11
9/10/12 Articles
Figure 1 Database schema for a simplified database for order processing.

As you can see in Figure 1, ordermaster.idand orderdetail.idare primary keys. There are also several
foreign keys: namely, ordermaster.customer_id, orderdetail.ordermaster_id, and orderdetail.product_id.

NOTE

In this simplified model, I skipped adding the product and the customer tables, so
ordermaster.customer_idand orderdetail.product_idaren't implemented as physical foreign keys.

We should take into consideration all of these primary and foreign keys when deciding on the datatype
to use.

Which Datatype Should Be Used for Surrogate Keys?


I'm going to focus on two different datatypes for surrogate keys, INTEGERand GUID. Let's start with
INTEGER.

NOTE

In the specific example shown in Figure 1, the users need a column in the ordermaster table to identify
each order uniquely. The most correct solution to this is add an ordernocolumn, which is given a value
separately from the IDcolumn. In the case of INTEGER+IDENTITYfor the primary key, it would be fairly
common to skip the ordernocolumn, but I prefer not to let the user know the real key value.

To avoid affecting the test too much, I decided to skip the user‑known column here. Otherwise, I would
have had to maintain the ordernovalue manually for when the IDENTITYhas already been used for the ID.
When GUID is used for the ID, I could have let IDENTITYhandle the ordernocolumn. I did not do that,
though, to avoid testing too many apples and oranges (or apples and pears, as we say in Sweden).

INTEGERas the Datatype

Even though we have decided to use surrogate keys, we are not yet finished—we still have to choose a
specific datatype. The most typical one to use is INTEGER. An INTEGERis 4 bytes. You can declare INTEGER
columns as IDENTITYand thereby let the database provide new row values automatically. Listing 1
illustrates an example of a new row being INSERTed into the ordermaster table and the new IDbeing
fetched for later use when INSERTing into the orderdetail table.

Listing 1: Stored Procedure for Inserting a Row into ordermaster, INT Version

CREATE PROCEDURE a_OrderINT_InsertMaster


(@customerId INT, @orderId INT OUTPUT)
AS
DECLARE @anError INT

INSERT INTO orderINTmaster


(customer_id, orderdate, status)
VALUES
(@customerId, GETDATE(), 1)

SET @anError = @@ERROR


SET @orderId = @@IDENTITY

RETURN @anError

Before we continue, I'd like to say a couple of things about the code in Listing 1 and the other code in
this article:

The error handling is not of production quality, but I decided to do it this way to avoid
distracting you from the topic of the article and to get as clear test results as possible. You will
find more information about appropriate error handling in my book, .NET Enterprise Design with
Visual Basic .NET and SQL Server 2000.

I am very fond of using user‑defined data types, but I skipped them in the code snippet here, to
make the code as obvious and clear as possible.

www.informit.com/articles/printerfriendly.aspx?p=25862 2/11
9/10/12 Articles

In real code, you obviously must use a lot of comments! For example, I always start all stored
procedures with a comment block, in which I state the purpose, tell who and when, and also
document.

You can download all the code for this article at http://www.jnsk.se/informit/pkguid/.

NOTE

You can let the application server decide what the next value should be, but then you would have
to be careful when INTEGERis used as the datatype for the primary keys that you don't get too
much chatter between the application server and the database server. Other risks are creating a
hot spot, which reduces concurrency, and creating a solution that makes it impossible to clone
the application server.

A rather long and interesting discussion addresses different solutions for "manually" creating
surrogate INTEGERs, but that is beyond the scope of this article.

UNIQUEIDENTIFIERas the Datatype

A GUID is called UNIQUEIDENTIFIERin SQL Server. It has been supported as a native datatype in SQL Server
since version 7. A GUID is 16 bytes long (four times the size of an INTEGER).

In Listing 2, you can see an example of a new row being INSERTed into the ordermaster table when
GUIDs are used for the keys. In this case, the IDis generated before the INSERTstatement, with the help
of NEWID().

NOTE

For reasons of simplicity, there are two sets of tables and stored procedures: one set with "INT" in the
name and one set with "GUID" in the name.

Listing 2: Stored Procedure for Inserting a Row into ordermaster, GUID Version

CREATE PROCEDURE a_OrderGUID_InsertMaster


(@customerId UNIQUEIDENTIFIER
, @orderId UNIQUEIDENTIFIER OUTPUT)
AS
DECLARE @anError INT

SET @orderId = NEWID()

INSERT INTO orderGUIDmaster


(id, customer_id
, orderdate, status)
VALUES
(@orderId, @customerId
, GETDATE(), 1)

SET @anError = @@ERROR

RETURN @anError

Pros and Cons of GUIDs as Primary Keys


Let's say that we plan to use GUIDs for the primary keys. What are the advantages and disadvantages
when compared to INTEGERs? We should always start with the positive, so here goes:

We get a more object ID–like datatype because each primary key typically will be unique for the
complete database. This gives good effects, for example, when designing a custom solution for
pessimistic locking (but that's another long story).

@@IDENTITYhas been a source of problems when INSERTtriggers have INSERTed rows into other
tables with IDENTITYs. (You can use SCOPE_IDENTITY()in SQL Server 2000 to avoid this problem.)

You can move the creation processing of the values from the data tier to the business or
consumer tiers. For example, the consumer can create complete structures of orders and details
www.informit.com/articles/printerfriendly.aspx?p=25862 3/11
9/10/12 Articles

that can be sent all the way through the business tier to the data tier in one go, which can reduce
the number of round trips. A more basic way of saying this is that you can get the ID value
before you use it, not just afterward, as is the case with IDENTITYs.

NOTE

In the case of .NET and ADO.NET, it's a big advantage to know the ID value at the time rows are
added to the DataSet. Otherwise, for example, autogenerated INSERTs can't be used for master‑
detail relationships within the DataSet.

The span of values for GUIDs is enough for most scenarios. INTEGERs are not big enough for some
scenarios. (By the way, with INTEGERs, you don't have to start the seed on 1. You can just as easily
start it on the smallest negative INTEGERand thereby double the span of values.)

You can prepare for merge replication. When merge replication is to be used, each table to be
replicated must have a GUID column (called ROWGUID in SQL Server). You also don't have to give
each server in a merge replication scenario a unique set of values to use for the keys. The risk of
duplicate GUIDs is so small that this shouldn't be a problem.

IDENTITYcan't be used for tables in distributed partitioned views.

A GUID will not mean anything to the users. An INTEGERcan be read, understood, and
remembered. It's common to see INTEGERkeys in the UI, and that might cause problems later.

You know what a nightmare it can be when doing a manual merge between two tables with
INTEGERs as primary keys. Not only do you have to create a new sequence for the union of both
tables, but you also must change all the foreign keys for all the dependent tables. If GUIDs are
used, this sort of problem does not arise.

NOTE

The algorithm for generating GUIDs doesn't use the MAC address of the network cards in recent
Windows versions anymore. Instead, it just creates a random value. In theory, this presents a risk
of getting duplicate GUIDs, but, in practice, it shouldn't be a problem.

The reason for excluding the use of the MAC address is that it not only couples users to GUIDs,
but some network cards don't use unique MAC addresses.

I will discuss this further when we reach the section about the COMBs. Stay tuned.

I'm not a salesman, so here are a few drawbacks:

There is a huge overhead in size. As I said, a GUID is four times larger than an INT. This is very
important when it comes to indexes.

Portability can be a problem because not every database has GUID as a native datatype.

You can't use GUIDs with aggregates such as MIN. This is unfortunate when it comes to using
WHILEplus an aggregate function, such as MINor MAXfor looping instead of a CURSOR. (Well, a
workaround is to use a CONVERTor a CASTto CHAR(36)or BINARY(16)of the GUID before it's used in the
aggregate, but watch out for performance problems with large tables.)

It's detrimental to developer‑friendliness because it's really hard to edit GUIDs manually, to
remember them when you browse the database, and so on. You have to get used to navigating
the database by using SQL scripts, but this is not usually that much of a problem.

So, the overhead related to size is the main problem. Most developers seem to have an idea of how big
the overhead is, but few have tested it. It's time that this was changed, so let's take a look at a few
performance tests and get an idea of how large the overhead is.

The Tests
Now I'll discuss the test cases and their results, as well as the test environments and other factors.

www.informit.com/articles/printerfriendly.aspx?p=25862 4/11
9/10/12 Articles

The Test Cases


The test cases are as follows:

A. Inserting an order with 10 detail rows (repeated 500 times, measured when having 1,000,000
orders)

B. Fetching 100 specific orders out of 1,000,000, one by one

C. The same as B, but this time with a JOINwith the orderdetail table

The Metrics
The metric is the time that the test case takes to execute.

NOTE

You can easily compare the size overhead of the database, if you want to. Just run some INSERTs for the
INTcase in one new database and run the same number of INSERTs for the GUID case in another new
database. Then compare the file sizes afterward.

Because the test database is so simplified, the size metric is a bit inaccurate because the number of
foreign keys compared to ordinary columns is pretty big. Keep this in mind if you evaluate the size
metric. This also applies to the time difference, but there are definitely situations in which the database
consists of almost only foreign keys.

All tests will be executed by a single user. The tests will be executed directly at the database server—
I'm testing only at the data tier in this article. For example, the database will create the GUIDs with
NEWID(). You could quite easily expand the test to run from several users or from an application server
instead. A very relevant test would be to let the application server create the GUIDs instead, but that is
beyond the scope of this article.

The Results
Now, I'm sure you are curious about the results. I promise to show them, but first I'd like to say a few
things about test results in general and the environment that I used.

Disclaimer

It's extremely important to be skeptical when reading test results. To come to the right conclusion, you
have to translate the tests to your environment and to your situation, and then re‑execute the tests.

Environment
The equipment that I used for the tests is not real production equipment; I used only a development
machine (1.7GHz, 512MB RAM). The test machine was a Windows XP machine with SP2 for SQL Server
2000.

In a new production environment, you will typically have RAID 1 (mirroring) for the log and RAID 10
(mirroring and striping) with at least four disks for the data. You will also have more CPU cache than
256KB, probably more than one CPU, and also at least 1GB of memory.

With that out of the way, let's take a look at the test results.

Test A: INSERTof Orders

In Table 1, you can find the results of the first test, in which I INSERTorders and details for those orders.
The time values are recalculated so that the first result has a value of 1(a baseline value) and all other
time values shows a percentage compared to the first test result value.

As you can see in Table 1, the overhead for the GUID solution is actually huge. What is especially
www.informit.com/articles/printerfriendly.aspx?p=25862 5/11
9/10/12 Articles

worrisome is the sign of bad scalability that I saw when I compared results from executions when I had
different numbers of orders.

Table 1 Result of Test A: INSERT of 500 Orders

Test A Time (with 1,000,000 Orders)

INTEGER 1

GUID 10

This big difference occurred with "only" 1,000,000 orders (and 10,000,000 detail rows). The difference
was very small with a small amount of orders, so this is probably a really bad problem with
100,000,000 orders!

Something is wrong—very wrong!

Test A: INSERTof Orders, Revised


I did quick tests before comparing GUIDs and IDENTITY+INTwithout seeing the huge overhead that I saw
when I scaled up the test a bit. (I also did the tests on NT4, and the problem is probably smaller there,
as you soon will see.) This certainly worried me a lot! I tried different tricks, different indexing
strategies, different fill factors, a clustered IDENTITY+INT, and so on—everything without luck. My guess
was that the reason for the problem was the lack of order for the GUIDs generated at Windows 2000
(and beyond). In Listing 3, you can see a series of GUIDs generated with a couple of calls to NEWID()on
SQL Server 7 on NT4. Compare that series with the one in Listing 4, which is generated with a couple of
calls to NEWID()on SQL Server 2000 on Windows 2000.

Listing 3: Some GUIDs Generated with SQL Server 7 on NT4

B3BFC6B1-05A2-11D6-9FBA-00C04FF317DF
B3BFC6B2-05A2-11D6-9FBA-00C04FF317DF
B3BFC6B3-05A2-11D6-9FBA-00C04FF317DF
B3BFC6B4-05A2-11D6-9FBA-00C04FF317DF
B3BFC6B5-05A2-11D6-9FBA-00C04FF317DF

Listing 4: Some GUIDs Generated with SQL Server 2000 on Windows 2000

C87FC84A-EE47-47EE-842C-29E969AC5131
2A734AE4-E0EF-4D77-9F84-51A8365AC5A0
70E2E8DE-500E-4630-B3CB-166131D35C21
15ED815C-921C-4011-8667-7158982951EA
56B3BF2D-BDB0-4AFE-A26B-C8F59C4B5103

As you saw in Listing 3, only the eighth half byte is changed between calls. On the other hand, in
Listing 4, only the 13th half byte is constant between calls. In Windows 2000, the MAC address isn't
used any longer for when GUIDs are generated. Instead, the GUID is only a 16‑byte random number.
Well, that isn't totally true. The 13th half byte is constant, so only 15.5 bytes are random. Adam Nathan
at Microsoft explained to me that the 13th half byte is the value that will point out the source of the
GUID, and 4means Microsoft. Thanks, Adam!

The guess that I had about the problem with the INSERToverhead for GUIDs was that the lack of order in
www.informit.com/articles/printerfriendly.aspx?p=25862 6/11
9/10/12 Articles

the Windows 2000–generated GUIDs was giving SQL Server a hard time administering indexes under
massive INSERTperiods. Therefore, I tried to create an order for the GUIDs instead. I tried to CASTthe
current DATETIMEto a BINARY(8)and put that first in the GUID. Unfortunately, that had no effect. When I
investigated it further, I found that when I had a BINARY(16)value and CASTed it to a UNIQUEIDENTIFIER,
some bytes were scrambled. What to do? You guessed it. I tried to compensate for the scrambling to
see if that had any positive effect on throughput, but no effect occurred. Then I found out that it wasn't
the first (high) byte that was important for the new ordering, but the last (low) bytes. I also learned that
I didn't have to use BINARY(8)for the current DATETIME. BINARY(6)is enough for the next 77 years, so I
decided to occupy only the last (low) 6 bytes with the current DATETIME. In Listing 5, you can see the
result of PRINT CAST(GETDATE() AS BINARY(8)). The first (high) bytes are 0.

Listing 5: GETDATE() as BINARY(8)

0x0000919001862D20

The algorithm that I used to generate my new GUIDs is shown in Listing 6; GUIDs of that type
(generated in SQL Server 2000 on Windows 2000) are shown in Listing 7. I call this type of GUIDs
COMBs (for COMBined, abbreviated).

Listing 6: An Algorithm to Generate Ordered GUIDs (COMBs)

DECLARE @aGuid UNIQUEIDENTIFIER

SET @aGuid = CAST(CAST(NEWID() AS BINARY(10))


+ CAST(GETDATE() AS BINARY(6)) AS UNIQUEIDENTIFIER)

Listing 7: A Set of GUIDs (COMBs) Generated with the New Algorithm

E25AFE33-DB2D-4502-9BF0-919001862D20
83E689D3-8549-4094-B223-919001862D20
CC22A56D-0CD5-43C5-990E-919001862D20
D5149998-1718-468C-B1AD-919001862D20
CBD0182D-4A0E-40AC-9A4C-919001862D20

As you saw in Listing 7, those GUIDs (COMBs) have the same last 6 bytes. Those represent the DATETIME
of when those five GUIDs were generated. A DATETIMEhas the "precision" of 1/300th of a second, and it's
no problem for SQL Server to generate five GUIDs in 3ms with this algorithm, even though it is slower
than using only NEWID(). As you can see, ordering the GUIDs isn't perfect because I will have several
GUIDs being generated with the same DATETIME—but that doesn't really matter here.

Finally, I found something that helped with my INSERToverhead. When I used this new solution, I got the
throughput table shown in Table 2.

Table 2 Result of Test A: INSERT of 500 Orders, Revised

Test A Time (with 1,000,000 Orders)

INTEGER 1

GUID (COMB) 1.05

www.informit.com/articles/printerfriendly.aspx?p=25862 7/11
9/10/12 Articles

This was great news, and the overhead of the GUIDs was actually less than expected. Perhaps you
wonder whether my algorithm to generate GUIDs in itself isn't very slow. Often when you do
conversions, you see CPU cycles fly away. On the other hand, compared to writing to a database, those
conversions are pretty cheap.

You also might think that this algorithm must be risky because there is no guarantee of unique GUIDs.
That's correct, but the built‑in algorithm makes no guarantee, either. It's just a random number. (Not
even the NT4 algorithm was guaranteed, as I said before.) Okay, a 15.5‑byte random number has low
predictability of being duplicated. (Most of us think that the chance of winning millions in the lottery is
almost zero. Still, the chance is a number that is less than 4 bytes!)

With my algorithm, the random number's size decreases from 15.5 bytes to 9.5 bytes because the last
6 bytes are occupied with the current DATETIME. Of course, the risk of generating duplicated random
numbers with my algorithm is larger than with NEWID(), but it's not a problem if the numbers don't occur
within the same 1/300 of a second. In my opinion, the risk of duplicates is small enough that it's
acceptable. And when you have UNIQUE CONSTRAINTs, it's okay even if you happen to see two duplicated
GUIDs being generated once a decennium or so.

NOTE

To judge the likelihood of generating two duplicate COMBs, I ran a quick test. I used 6 static bytes for
the low bytes and then created a million COMBs without duplicates. It won't be possible for you to
INSERTclose to that many rows in 1/300 of a second.

I also discussed the risk with a friend who is a mathematician. He pointed out that we don't know the
algorithm being used for the random number. Is it "perfect"? He also pointed out that it's very
dangerous to make assumptions and conclusions about likelihood and random numbers. Even so, he
said that there seemed to be a very low risk of creating duplicate COMBs.

To add some more fuel to the fire, I'd like to point out that the differences in real‑life apps are
probably bigger than the one I showed in Table 1 when ordinary GUIDs are used. This is because I
skipped test results when CHECKPOINTing was going on—CHECKPOINTing was much more meaningful to the
GUID test. When I used my custom (COMB) algorithm, CHECKPOINTing was much less frequent and was
used for short intervals. That is obvious when I compare the time for INSERTing 500,000 orders when
there are 500,000 orders present at the start. You can see the throughput shown in Table 3.

Table 3 Result of Test A: INSERT of 500,000 Orders

Test A Time (with 500,000 Orders)

INTEGER 1

GUID 30

GUID (COMB) 1.1

When did you last get a wage increase of 30 times your salary? Think about it, and you'll get a feeling
for what a huge difference this is!

Finally, the memory consumption of the tests was very different. The IDENTITY+INTand my new GUID
(COMB) algorithm grabbed perhaps 1MB of RAM for each 500 orders INSERTed. The ordinary GUID
www.informit.com/articles/printerfriendly.aspx?p=25862 8/11
9/10/12 Articles

algorithm grabbed all memory given to SQL Server (350MB in my tests) when 500 or 1,000 orders were
INSERTed!

NOTE

One property of ordinary GUIDs isn't maintained with COMBs: The order of creation is not known with
GUIDs. With COMBs, it is known. For some applications, this can be a problem that keeps COMBs from
being used.

Is it common to see this type of raw INSERTs only? No, not that common. Most applications also need
some reading of data. Now on to the second test.

Test B: Fetch of Single Order

In the second test, I fetch the customer ID from some of the orders, one by one. To get a correct set of
IDs to work with, I first ran a script that loops through the table to grab 100 random IDs and PRINTs a
complete stored procedure call to the stored procedure a_Order_Fetch, which is shown in GUID version in
Listing 8.

Listing 8: GUID Version of a_Order_Fetch

CREATE PROCEDURE a_OrderGUID_Fetch


(@orderId UNIQUEIDENTIFIER
, @customerId UNIQUEIDENTIFIER OUTPUT)
AS
DECLARE @anError INT

SET @aCustomerId =
(SELECT om.customer_id
FROM orderGUIDmaster om
WHERE om.id = @orderId)

SET @anError = @@ERROR

RETURN @anError

Then I copied the 1,000 PRINTed rows and pasted them to another script that EXECUTEs the calls. To skew
the test results as little as possible here, I first executed a DBCC DROPCLEANBUFFERSin the beginning of the
script that calls the stored procedure.

Let's take a look at the results for the second test. As usual, the first value is recalculated to 1, to be
used as a baseline.

Table 4 Result of Test B: SELECT of Orders

Test B Time for Fetching 100 Orders

INTEGER 1 (baseline)

GUID (COMB) 1.1

This is fine, isn't it? But what about the dreaded 16‑byte JOIN?

Test C: A Small JOIN


www.informit.com/articles/printerfriendly.aspx?p=25862 9/11
9/10/12 Articles

The most common argument for not using GUIDs for primary keys is that it's so expensive for JOINs.
Perhaps you have heard this: "Use a 16‑byte column to join? Are you crazy?"

I reused Test B but changed the stored procedures slightly so that the orderdetail table was also used
in a JOIN, as shown in Listing 9.

Listing 9: COMB Version of a_Order_FetchWithDetail

CREATE PROCEDURE a_OrderCOMB_Fetch


(@orderId UNIQUEIDENTIFIER, @noOfRows INT OUTPUT)
AS
DECLARE @anError INT

SET @noOfRows =
(SELECT COUNT(*)
FROM orderCOMBmaster om
INNER JOIN orderCOMBdetail od
ON om.id = od.ordermaster_id
WHERE om.id = @orderId)

SET @anError = @@ERROR

RETURN @anError

The results of the third test case are shown in Table 5.

Table 5 Result of Test C: SELECT of orders and JOIN with Details

Test C Time for Fetching 100 Orders

INTEGER 1 (baseline)

GUID (COMB) 1.1

Worth mentioning is that the foreign keys are clustered both in the INTEGERcase and in the COMB case.
As a matter of fact, it was faster not using clustered foreign keys for this specific test, but I thought
that it was most natural (and probably more effective in other tests) to use clustered foreign keys.

A Few Tweaks
The test set is pretty untuned, but I'd like to mention a few small tweaks that proved to work well:

Even though you saw that the INSERTthroughput for the ordinary GUID case wasn't good, it would
have been even worse if I had used clustered primary keys in that case.

Each INSERTof an order is wrapped in an explicit transaction. That is, I wrote 1 row to
ordermaster and 10 rows to orderdetail in each transaction. Writing many orders in each
transaction would have given a shorter processing time, but in an Online Transaction Processing
(OLTP) application, it's often natural to have one transaction per order.

I ran a DBCC DBREINDEX ('tablename', '', 90)before each performance test. This is very important
for the GUID case but not that important for the INTEGERand COMB cases. I also did an UPDATE
STATISTICSbefore I executed the second test, and I recompiled the stored procedures.

www.informit.com/articles/printerfriendly.aspx?p=25862 10/11
9/10/12 Articles

The orderdetail.ordermaster_idis indexed. Otherwise, Test C will probably run forever.

I did a few other things between tests to get as fair tests as possible. I ran a backup of the database
after the massive INSERTs to all the tables before I collected my metrics. That way, I emptied the
transaction log. I also stopped SQL Server between each test case. And, of course, I ran each test case
several times and skipped extreme values before finding typical values.

Other Approaches
Of course, there are other solutions than INTs and GUIDs. Another typical solution is to go for the BIGINT
datatype (64 bits), which is new for SQL Server 2000. You can use IDENTITYfor BIGINTs, too.

Conclusion
Here is my conclusion regarding the overhead of using ordinary GUIDs for the primary keys:

Indexes need more frequent rebuilding with GUIDs. That also applies when updating statistics.

You must be prepared to see inconsistent INSERTthroughput at different times.

The scalability for INSERTs isn't good because the INSERToverhead increases rapidly.

With my COMB algorithm, the three problems mentioned disappear. The problem with the large
overhead in size is there both with my COMBs and with ordinary GUIDs. (After all, COMBs have the
format of GUIDs!)

But GUIDs have a lot of advantages, too. (These are sometimes so big that you really don't have a
choice!) In my opinion, using GUIDs as primary keys is often a very good idea (especially if you go for
the COMBs instead). Is the overhead or the advantages more important when it comes to your
applications?

Special thanks to Vladimir Suponitsky and Tibor Karaszi!

www.informit.com/articles/printerfriendly.aspx?p=25862 11/11

You might also like