Professional Documents
Culture Documents
Definition of Null –
Null was first introduced by E. F. Codd, the creator of the relational database
model. NULL is a special marker used in SQL that represents a missing or unknown
value of an attribute.
See truth table for 3VL (image below). Note the OR operator with True OR
Unknown evaluates to True. Likewise, False AND unknown evaluates to false.
Problems with Nulls & the Three Valued Logic Approach (Ch 17-18)
Overview of Problem –
The transition from two valued logic (true/false) to three valued logic
(true/false/unknown) in relational databases encounters many problems due to the
complexity and ambiguity with handling three logic values. Many relation algebra
operators and aggregate functions that are commonly used in SQL queries do not extend
to use of nulls seamlessly. C.J. Date argues that there is no ‘sensible interpretation in
terms of how the real world works’ and thus three valued logic should not be basis for
building a database system. Below you will find examples of where the use of
nulls/unknowns causes confusion and doesn’t represent the real world appropriately.
Tables
Depending on the system, the expression above could be evaluated in two different ways:
Without optimization, the right side evaluates to UNK and UNK -> UNK. Therefore,
the where clause in the query NOT(UNK) evaluates to UNK and nothing is returned.
With some system optimization (predicate transitive closure), the expression would be
optimized with the transitive property and would look like this: A=B & B=C, then A=C
This new expression would evaluate to UNK and UNK and ‘D2’= ‘D1’
UNK and UNK and false = false
Therefore is the same query was run, the where clause is NOT(false) = true and the emp#
E1 would be returned.
This example shows that certain identities in two value logic are invalid and 3VL and
certain optimizations run in current systems could violate traditional 3VL.
**In the real world, if we were evaluating this expression by hand, we would see that the
optimized expression returns the correct results and 3VL logic alone would not**
Example 2
Using the same tables above, consider the following query:
Select EMP# FROM EMP WHERE MAYBE ( DEPT# = ‘D1’ )
Note, the maybe operator returns if the operand is UNK, false otherwise.
Running the query, the single row in the EMP table has an UNK department and E1
would therefore be returned. However, in the real world, we should assume that the
EMP.DEPT# is a foreign key that matches DEPT.DEPT#. If this were the case, we would
know the Dept# can’t be D1 (as it doesn’t even exist in the Dept table), and the query
should return nothing. This is called semantic optimization – optimization that is
performed on the basis of the optimizer’s understanding of the semantics of the situation.
Example 3
Consider the following expression
This example shows pretty clearly a possible problem with nulls in the system. If the
value B is null, the outcome of this expression would be unknown. The obviously logical
answer to this expression is it will always be true (assuming b represents some value).
The problems with three valued logic in the examples is that it treats each unknown
reference as totally unknown. It is does not connect like references or use any additional
information that may constrain the unknown value. Therefore, Data argues that pure three
valued logic does not behave in accordance with the real world.
Date also points out the inconsistencies with null and Codd’s other principles:
1. The information rule (rule 1) “all information in a relational database is represent
explicitly at the logical level … in exactly one way – by values in tables”
2. In another writing, Codd describes the A-Mark null “is treated neither as a value
nor as a variable by the DBMS”
These two statements conflict and the use of the A-Mark null violates rule 1.
The following list shows other two valued logic identities that are not equivalent in three
value logic. If a user is unaware 3VL acts in a different manner, he/she is likely to make
assumptions that could prove costly.
Aggregate Functions – SQL specific problem
In SQL, the aggregate functions SUM, AVG, MAX, and MIN of an empty set all return
null (“value unknown”). Date argues that this is incorrect and the functions should return
other values.
Date also suggests that there are many other ‘psychological’ difficulties in addition to the
ones shown in the examples. These issues make it easy for unwary users and
implementers to make mistakes and not understand the true complexity that comes with
nulls and three valued logic.
The Use of Default Values to Solve the Missing Information Problem (CH. 21)
C.J. Date’s prefers systematic the use of default values to the use of nulls/3VL. By
systematic he means the system needs to way to refer to the default values and know how
to recognize these for certain operations. Date describes the default value approach in an
ideal system where there is built in functionality to support the use of default values.
Note: Popular DBMS implementations do not represent an ideal system as they support
the use of nulls rather than default values. However, it is possible for a user to build an
additional support layer on top of the DBMS to use default values appropriately.
Date argues the default value approach is the approach that most closely resembles how
the real world deals with missing information. Ex. Putting an N/A or / or – to represent
something that is either unknown or inapplicable.
• When creating tables, each column declaration must contain either an UNK clause
specifying the values or the clause will state ‘UNKS NOT ALLOWED.’ See
example below.
The UNK value specified must exist in the domain the column is specified in
(number for numerical columns, text for string columns, etc).
• If the every legal value of a domain of a column could represent a real value, the
user may have to create a separate user controlled indicator columns to specify
whether the value is unknown (such cases should be very rare)
• Whenever a new row is inserted into a table, the user must supply a value for
every column that doesn’t allow unknowns.
• If a new column is added, the new column must have an UNK clause and all
existing rows automatically get set to the default value.
Integrity Aspects of Default Values –
Entity Integrity
• The entity integrity rule of “primary key values in base relations cannot be wholly
or partly null” no longer applies with default values. Since no columns can be null
this rule no longer applies.
• By elimination the entity integrity rule, default values also eliminate the property
of the 3VL relational model that there is a distinction between base relations and
other relations.
Referential Integrity
• The referential integrity rule can be simplified and now states that “every value of
a given foreign key must exist as a value of the relevant target primary key.” If a
foreign key has a value of UNK, the target table must have a row with the UNK
value as its primary key.
• Without this value present there exists a possible that certain updates or deletes
operations on any row could fail if there isn’t a matching value in the target table.
Database Specific Integrity
• The UNKS NOT ALLOWED specification contains its own integrity constraint
and will reject any operation that violates its properties.
• An UNK clause can have special implications for database specific integrity
This example will need to be interpreted by the system that the attribute status can
take on a value or -1 or a value between 10 and 100. The DBMS will also contain
information that can identify a value of -1 as meaning it’s an unknown value.
Date’s Argument as to why the default value scheme is better than 3VL
• Easier to understand and easier to implement
• Directly relates to how the real world handles missing information
• Fewer traps for an inexperience user
• Can be extended fairly easy to include other types of missing information such
as inapplicable and undefined values (with the use of additional keywords)
Codd argues against default values “on the grounds that it is unsystematic,
misrepresents the semantics, and is a significant burden on DBAs and users”
DB2 and SQL Standard Product both contain basic default value support but neither is
as advanced as the systematic support described in the notes above