Professional Documents
Culture Documents
Table of Contents
Why be normal?
DQGWKHQ0XPP\
2ND\WKDW
V
FDOOHGPHKHUJRRG
MXVWQRWQRUPDO
OLWWOHKHOSHU
,·PDQLFKWK\RORJLVW,RQO\ZDQWWRVHDUFK
P\WDEOHIRUVSHFLHVQDPHRUFRPPRQ
QDPHWRJHWWKHZHLJKWDQGORFDWLRQRI
UHFRUGVHWWLQJILVK
Mark
160 Chapter 4
Jack’s table has the common name and weight of the fish, but it also
contains the first and last names of the people who caught them, and
it breaks down the location into a column containing the name of the
fish,
body of water where the fish was caught, and a separate state column. This table is also about record-breakingns.
but it has almo st twice as many colum
fish_records
,·PDZULWHUIRU5HHODQG&UHHOPDJD]LQH
,QHHGWRNQRZWKHQDPHVRIWKHILVKHUPHQ
GDWHVDQGORFDWLRQVRIWKHELJFDWFKHV
Write a query for each table to find
all records from New Jersey.
Jack
sharpen solution
Write a query for each table to find all records from New Jersey.
,DOPRVWQHYHUQHHGWRVHDUFKE\
VWDWH,LQVHUWHGWKHGDWDZLWKVWDWHV
LQWKHVDPHFROXPQDVWKHWRZQ
,RIWHQKDYHWRVHDUFKE\VWDWH
VR,SXWLQDVHSDUDWHVWDWH
FROXPQZKHQ,FUHDWHGP\WDEOH
162 Chapter 4
Q: So Jack’s table is better than Q: Why are shorter queries better Q: Suppose we had a street address.
Mark’s? than longer ones? Why couldn't we have one column with
A: A:
the entire address, then other columns
No. They’re different tables with The simpler the query, the better. that break it apart?
different purposes. Mark will rarely need
to search directly for a state because he
only really cares about the species and
As your database grows, and as you add
in new tables, your queries will get more
complicated. If you start with the simplest
A: While duplicating your data might
seem like a good idea to you now,
common names of the record-breaking possible query now, you'll appreciate it consider how much room on your hard
fish and how much they weighed. later. drive it will take up when your database
Q:
grows to an enormous size. And each time
Jack, on the other hand, will need to
So are you saying I should you duplicate your data, that’s one more
search for states when he’s querying his
always have tiny bits of data in my clause in an UPDATE statement you’ll
data. That’s why his table has a separate
columns? have to remember to add when your data
column: to allow him to easily target states
changes.
in his searches.
table-creation guidelines
1. Pick your thing, the one thing you want What’s the main thing
you want your table to
your table to describe. be about?
164 Chapter 4
Can you spot the columns in this sentence Mark the ichthyologist used to describe how
he wants to select from his table? Fill in the column names.
,ZDQWWKHZHLJKWDQGORFDWLRQZKHQ,
VHDUFKE\FRPPRQQDPHRUVSHFLHV
Your turn. Write a sentence for Jack, the writer for Reel and Creel magazine,
who uses his table to select details for his articles. Then draw arrows from
each column to where it's mentioned in the sentence.
last_name first_name
common
state
weight date
location
exercise solution
Can you spot the columns in this sentence Mark the ichthyologist used to describe how
he wants to select from his table? Fill in the column names.
location
weight
,ZDQWWKHZHLJKWDQGORFDWLRQZKHQ,
VHDUFKE\FRPPRQQDPHRUVSHFLHV
common
species
Your turn. Write a sentence for Jack, the writer for Reel and Creel magazine,
who uses his table to select details for his articles. Then draw arrows from
each column to where it's mentioned in the sentence.
last_name first_name
common
state
,ZDQWWKHILUVWQDPHDQGODVWQDPHRIWKHILVKHUPDQDVZHOODVWKHGDWH
ORFDWLRQVWDWHDQGZHLJKWRIDILVKZKHQ,VHDUFKE\LWVFRPPRQQDPH
weight date
location
166 Chapter 4
%XWZK\VWRSWKHUHZLWK-DFN·VWDEOH"
&RXOGQ
W\RXEUHDNXSWKHGDWHLQWRPRQWK
GD\DQG\HDU"<RXFRXOGHYHQEUHDNWKHORFDWLRQ
GRZQLQWRVWUHHWQXPEHUDQGVWUHHWQDPH
atomic data
Atomic data
What’s an atom? A little piece of information that can’t or shouldn’t be divided.
It’s the same for your data. When it’s ATOMIC, that means that it’s been broken
down into the smallest pieces of data that can’t or shouldn’t be divided.
+--------------+--------------------------+
| order_number | address |
+--------------+--------------------------+
| 246 | 59 N. Ajax Rapids |
| 247 | 849 SQL Street |
| 248 | 2348 E. PMP Plaza |
| 249 | 1978 HTML Heights |
| 250 | 24 S. Servlets Springs |
| 251 | 807 Infinite Circle |
| 252 | 32 Design Patterns Plaza |
| 253 | 9208 S. Java Ranch |
| 254 | 4653 W. EJB Estate |
| 255 | 8678 OOA&D Orchard |
+--------------+--------------------------+
> SELECT address FROM pizza_deliveries WHERE order_num = 252;
+--------------------------+
| address |
+--------------------------+
| 32 Design Patterns Plaza |
+--------------------------+
1 row in set (0.04 sec)
168 Chapter 4
+---------------+------------------------+---------------+---------+
| street_number | street_name | property_type | price |
+---------------+------------------------+---------------+---------+
| 59 | N. Ajax Rapids | condo | 189000 |
| 849 | SQL Street | apartment | 109000 |
| 2348 | E. PMP Plaza | house | 355000 |
| 1978 | HTML Heights | apartment | 134000 |
| 24 | S. Servlets Springs | house | 355000 |
| 807 | Infinite Circle | condo | 143900 |
| 32 | Design Patterns Plaza | house | 465000 |
| 9208 | S. Java Ranch | house | 699000 |
| 4653 | SQL Street | apartment | 115000 |
| 8678 | OOA&D Orchard | house | 355000 |
+---------------+------------------------+---------------+---------+
> SELECT price, property_type FROM real_estate WHERE street_name = ‘SQL Street’;
+-----------+---------------+
| price | property_type |
+-----------+---------------+
| 109000.00 | apartment |
| 115000.00 | apartment |
+-----------+---------------+
2 rows in set (0.01 sec)
columns contain
3. Do your
atomic data to make
your queries short and to the point?
Q: Aren’t atoms tiny, though? Shouldn’t I be breaking my Q: How does atomic data help me?
A:
data down into really tiny pieces?
A: No. Making your data atomic means breaking it down into the
It helps you ensure that the data in your table is accurate. For
example, if you have a column for street numbers, you can make
smallest pieces that you need to create an efficient table, not just the sure that only numbers end up in that column.
smallest possible pieces you can.
Atomic data also lets you perform queries more efficiently because
Don’t break down your data any more than you have to. if you don’t the queries are easier to write and take a shorter amount of time to
need extra columns, don’t add them just for the sake of it. run, which adds up when you have a massive amount of data stored.
170 Chapter 4
Here are the official rules of atomic data. For each rule, sketch out
two hypothetical tables that violate each rule.
sharpen solution
Here are the official rules of atomic data. For each rule, sketch out
two hypothetical tables that violate each rule.
Remember Greg
ingredients That has a colum'sntable?
food name hobbies that oftenfor
multiple interest contains
bread searching a nights,mamaking
flour, milk, egg, yeast, oil re!
It’s the same here:
imagine trying to find
salad lettuce, tomato, cucumber tomato amongst all those
other ingredients.
172 Chapter 4
Now that you know the official rules and the three steps to making data atomic, take a look at
each table from earlier in this book and explain why it is or isn't atomic.
normalizing tables
Reasons to be normal
When your data consultancy takes off and you need to hire
more SQL database designers, wouldn’t it be great if you Making your data
didn’t need to waste hours explaining how your tables work?
Well, making your tables NORMAL means they follow some
atomic is the first
standard rules your new designers will understand. And the
good news is, our tables with atomic data are halfway there.
step in creating a
NORMAL table.
Now that you know the official rules and the three steps to making data atomic, take a look at
each table from earlier in this book and explain why it is or isn't atomic.
Greg's table, page 47 Not atomic. The “interest” and “seeking” columns violate rule 1.
Donut rating table, page 78 Atomic. Unlike the easy_drinks table, each column holds a different
type of information. And, unlike the clown table “activities” column,
each column has only one piece of information in it.
Clown table, page 121 Not atomic. The “activities” column has more than one
activity in some records, and thus violates rule 1.
Drink table, page 59 Not atomic. There is more than one “ingredient” column,
which violates rule 2.
Fish info, page 160 Atomic. Each column holds a different type of information.
And each column has only one piece of information in it.
174 Chapter 4
0\WDEOHVDUHQ·WWKDWELJ
:K\VKRXOG,FDUHDERXW
QRUPDOL]LQJWKHP"
Let's make the clown table more atomic. Assuming you need to search
on data in the appearance and activities columns, as well as
176 Chapter 4
Halfway to 1NF
Remember, our table is only about halfway normal when it’s got
atomic data in it. When we’re completely normal we’ll be in the
FIRST NORMAL FORM or 1NF.
To be 1NF, a table must follow these two rules:
Take care using SSNs as the Primary Keys for your records.
With identity theft only increasing, people don’t want to give out SSNs—
and with good reason. They’re too important to risk. Can you absolutely
guarantee that your database is secure? If it’s not, all those SSNs can be
stolen, along with your customers’ identities.
178 Chapter 4
Given all these rules, can you think of a good primary key to use in a table?
Look back through the tables in the book. Do any of them have a column
that contains truly unique values?
:DLWVRLI,FDQ·WXVH661DVWKHSULPDU\NH\EXWLW
VWLOOQHHGVWREHFRPSDFWQRW18//DQGXQFKDQJHDEOH
ZKDWVKRXOG,XVH"
Geek Bits
180 Chapter 4
Q: You said “first” normal form. Does that mean A: No. So far, not a single table we’ve created has a
there’s a second normal form? Or a third? primary key, a unique value.
A: Yes, there are indeed second and third normal Q: The comments column in the doughnut table
forms, each one adhering to increasingly rigid sets really doesn’t seem atomic to me. I mean, there’s no
of rules. We’ll cover second and third normal form in reasonable way to query that column easily.
A:
Chapter 7.
Getting to NORMAL
It’s time to step back and normalize our tables. We need to make our
data atomic and add primary keys. Creating a primary key is normally
something we do when we write our CREATE TABLE code.
Fixing Greg’s table Step 1: SELECT all of your data and save
it somehow.
Fixing Greg’s table Step 3: INSERT all that old data into the
new table, changing each row to match the new table structure.
:DLWDVHFRQG,DOUHDG\KDYHDWDEOHIXOORIGDWD
<RXFDQ
WVHULRXVO\H[SHFWPHWRXVHWKH'5237$%/(
FRPPDQGOLNH,GLGLQ&KDSWHUDQGW\SHLQDOOWKDWGDWD
DJDLQMXVWWRFUHDWHDSULPDU\NH\IRUHDFKUHFRUG«
182 Chapter 4
But what if you don’t have your old CREATE TABLE printed
anywhere? Can you think of some way to get at the code?
table
Show me the
What if you use the DESCRIBE my_contacts command to
look at the code you used when you set up the table? You’ll
see something that looks a lot like this:
+------------+--------------+------+-----+---------+----------------+
| Column | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+----------------+
| last_name | varchar(30) | YES | | NULL | |
| first_name | varchar(20) | YES | | NULL | |
| email | varchar(50) | YES | | NULL | |
| gender | char(1) | YES | | NULL | |
| birthday | date | YES | | NULL | |
| profession | varchar(50) | YES | | NULL | |
| location | varchar(50) | YES | | NULL | |
| status | varchar(20) | YES | | NULL | |
| interests | varchar(100) | YES | | NULL | |
| seeking | varchar(100) | YES | | NULL | |
+------------+--------------+------+-----+---------+----------------+
But we really want to look at the CREATE code here, not the fields in the
table, so we can figure out what we should have done at the very beginning
without having to write the CREATE statement over again.
The statement SHOW CREATE_TABLE will return a CREATE TABLE
statement that can exactly recreate our table, minus any data in it. This way,
you can always see how the table you are looking at could be created. Try it:
184 Chapter 4
Time-saving command
Take a look at the code we used to create the table on page 183, and the code
below that the SHOW CREATE TABLE my_contacts gives you. They aren’t
identical, but if you paste the code below into a CREATE TABLE command,
the end result will be the same. You don’t need to remove the backticks or data
settings, but it’s neater if you do.
186 Chapter 4
Q: So you say that the PRIMARY KEY can’t be NULL. And there’s one more command that’s VERY useful:
What else keeps it from being duplicated?
A:
SHOW WARNINGS;
Basically, you do. When you INSERT values into If you get a message on your console that your SQL
your table, you’ll insert a value in the contact_id command has caused warnings, type this to see the actual
column that’s new each time. For example, the first warnings.
INSERT statement will set contact_id to 1, the next
contact_id will be 2, etc. There are quite a few more, but those are the ones that are
Q:
related to things we’ve done so far.
That’s quite a pain to have to assign a new value
to that PRIMARY KEY column each time I insert a new Q: So what’s up with that backtick character that
record. Isn’t there an easier way? shows up when I use a SHOW CREATE TABLE? Are you
A:
sure I don’t need it?
There are two ways. One is using a column in your
data that you know is unique as a primary key. We’ve A: It exists because sometimes your RDBMS might not
be able to tell a column name is a column name. If you use
mentioned that this is tricky (for example, the problem with
using Social Security Numbers). the backticks around your column names, you can actually
(although it’s a very bad idea) use a reserved SQL keyword
The easy way is to create an entirely new column just to as a column name.
hold a unique value, such as contact_id on the facing
page. You can tell your SQL software to automatically fill in a For example, suppose you wanted to name a column
number for you using keywords. Turn the page for details.
select for some bizarre reason. This column declaration
Q: Can I use SHOW for anything else besides the
wouldn’t work:
`select` varchar(50)
SHOW COLUMNS FROM tablename;
This command will display all the columns in your table and
their data type along with any other column-specific details.
Q: What’s wrong with using keywords as column
names, then?
AUTO_INCREMENT keyword
2ND\VHHPVVLPSOHHQRXJK%XWKRZ
GR,GRDQ,16(57VWDWHPHQWZLWKWKDW
FROXPQDOUHDG\ILOOHGRXWIRUPH"&DQ,
DFFLGHQWDOO\RYHUZULWHWKHYDOXHLQLW"
188 Chapter 4
1 Write a CREATE TABLE statement below to store first and last names of people. Your table
should have a primary key column with AUTO_INCREMENT and two other atomic columns.
2 Open your SQL terminal or GUI interface and run your CREATE TABLE statement.
3 Try out each of the INSERT statements below. Circle the ones that work.
4 Did all the Bradys make it? Sketch your table and its contents after
trying the INSERT statements
your_table
id first_name last_name
exercise solution
1 Write a CREATE TABLE statement below. Your table should have a primary key column with
AUTO_INCREMENT and two other atomic columns.
3 Try out each of the INSERT statements below. Circle the ones that work.
190 Chapter 4
Q: Why did the first query, the one with NULL for the
id column, insert the row when id is NOT NULL?
/RRN\RX·UHQRWUHDVVXULQJPH6XUH,FDQ
SDVWHLQWKHFRGHIURP6+2:&5($7(7$%/(EXW
,·YHVWLOOJRWWKHIHHOLQJWKDW,·PJRLQJWRKDYHWRGURS
P\WDEOHDQGVWDUWRYHUHQWHULQJDOOWKRVHUHFRUGV
DJDLQMXVWWRDGGWKHSULPDU\NH\FROXPQWKHVHFRQG
WLPHDURXQG
Do you think that this will add values to the new contact_id column for records
primary key first.
already in the table or only for newly inserted records? How can you check?
You should recognize the line that
designates the primary key.
Here’s the code to add the
new column to the table.
iliar, huh?!
ALTER TABLE my_contacts Looks fam
it contact_id.
192 Chapter 4
Will Greg get his phone number column? Turn to Chapter 5 to find out.
sql in review
ATOMIC DATA
Data in your columns
it’s been broken down is atomic if
smallest pieces that into the
you need.
LE 1:
DATA RU
ATOMIC ral SHOW CREATE
TABLE
can’t have seve
Atomic data me type of data in Use this command to see
bits of the samn. correct syntax for creat the
the same colu 2: existing table. ing an
LE
DATA RU
ATOMIC
tiple
can’t have mul FIRST
Atomic data the same type of NORMAL
FORM (
columns with Each rowo
1NF)
data. atomic valuef data must contain
PRIMARY KEY data must h s, and each row of
ave a unique
A column or set of columns that identifier.
uniquely identifies a row of data
in a table
AUTO_IN
CREMENT
When used in
declaration, t your column
automatically hat column will
integer value be given a unique
INSERT com each time an
mand is perfor
med.
194 Chapter 4
Let's make the clown table more atomic. Assuming you need to search
on data in the appearance and activities columns, as well as
last_seen, write down some better choices for columns.
The best you can do is to pull out things like gender, shirt color, pant
color, hat type, musical instrument, transportation, balloons (yes or no
for values), singing (yes or no for values), dancing (yes or no for values).