You are on page 1of 6

Lecture 1 Notes: Nulls and Three-Valued Logic

Background of the Missing Information Problem, Nulls, and Three-Valued Logic

Missing Information Problem –


In real world databases, some values may be unknown and thus would need to be
accounted for in a special way by a relational database management system (RDBMS).
The missing information problem is simply, how should a database account for
information that is either missing or inapplicable.

Solutions to the Missing Information Problem –


Two main solutions will be discussed in these notes: the use of nulls and three
valued logic (3VL) and the use of default values. The most common representation of
missing information that is seen in relational databases is the use of nulls.

Definition of Null –
Null was first introduced by E. F. Codd, the creator of the relational database
model. NULL is a special marker used in SQL that represents a missing or unknown
value of an attribute.

Key Properties of Null –


- It is not a value
- It doesn’t belong to any data domains (integer, varchar, etc.)
- Any comparisons made directly with null can’t result in true or false, but
always to a third logical value known as unknown
Nulls could be broken down further but not in regular DBMS systems
A-Mark Null – “value applicable but missing” ex. Missing birthday
I-Mark Null – “value inapplicable” ex. Marriage date for single person

Three Valued Logic (3VL) –


With SQL’s adaptation of nulls and the possibility of an unknown logic answer,
Codd also introduced the implementation of three valued logic that accounts for the
logical answers True, False, and Unknown.

See truth table for 3VL (image below). Note the OR operator with True OR
Unknown evaluates to True. Likewise, False AND unknown evaluates to false.
Problems with Nulls & the Three Valued Logic Approach (Ch 17-18)

Overview of Problem –
The transition from two valued logic (true/false) to three valued logic
(true/false/unknown) in relational databases encounters many problems due to the
complexity and ambiguity with handling three logic values. Many relation algebra
operators and aggregate functions that are commonly used in SQL queries do not extend
to use of nulls seamlessly. C.J. Date argues that there is no ‘sensible interpretation in
terms of how the real world works’ and thus three valued logic should not be basis for
building a database system. Below you will find examples of where the use of
nulls/unknowns causes confusion and doesn’t represent the real world appropriately.

Tables

Example 1 – (note the author uses UNK instead of null)

-Identities in two valued logic are not valid under 3VL


-Some systems have optimization techniques for queries that do not follow 3VL correctly
Truth valued expression
SQL query

Depending on the system, the expression above could be evaluated in two different ways:
Without optimization, the right side evaluates to UNK and UNK -> UNK. Therefore,
the where clause in the query NOT(UNK) evaluates to UNK and nothing is returned.
With some system optimization (predicate transitive closure), the expression would be
optimized with the transitive property and would look like this: A=B & B=C, then A=C

This new expression would evaluate to UNK and UNK and ‘D2’= ‘D1’
UNK and UNK and false = false
Therefore is the same query was run, the where clause is NOT(false) = true and the emp#
E1 would be returned.
This example shows that certain identities in two value logic are invalid and 3VL and
certain optimizations run in current systems could violate traditional 3VL.
**In the real world, if we were evaluating this expression by hand, we would see that the
optimized expression returns the correct results and 3VL logic alone would not**
Example 2
Using the same tables above, consider the following query:
Select EMP# FROM EMP WHERE MAYBE ( DEPT# = ‘D1’ )
Note, the maybe operator returns if the operand is UNK, false otherwise.

Running the query, the single row in the EMP table has an UNK department and E1
would therefore be returned. However, in the real world, we should assume that the
EMP.DEPT# is a foreign key that matches DEPT.DEPT#. If this were the case, we would
know the Dept# can’t be D1 (as it doesn’t even exist in the Dept table), and the query
should return nothing. This is called semantic optimization – optimization that is
performed on the basis of the optimizer’s understanding of the semantics of the situation.

Example 3
Consider the following expression

This example shows pretty clearly a possible problem with nulls in the system. If the
value B is null, the outcome of this expression would be unknown. The obviously logical
answer to this expression is it will always be true (assuming b represents some value).

The problems with three valued logic in the examples is that it treats each unknown
reference as totally unknown. It is does not connect like references or use any additional
information that may constrain the unknown value. Therefore, Data argues that pure three
valued logic does not behave in accordance with the real world.

Date also points out the inconsistencies with null and Codd’s other principles:
1. The information rule (rule 1) “all information in a relational database is represent
explicitly at the logical level … in exactly one way – by values in tables”
2. In another writing, Codd describes the A-Mark null “is treated neither as a value
nor as a variable by the DBMS”
These two statements conflict and the use of the A-Mark null violates rule 1.

Domain Cardinality and Nulls also present a somewhat misleading scenario


(A domain with cardinality n means there are n possible values for any attribute defined
on that domain)
• An attribute can be defined with a domain of cardinality one (meaning one
possible value is allowed) and yet the DBMS would allow nulls for this attribute.
• For every other discrete cardinality, a value will never be ‘totally unknown’ as
you always know its relative domain. This fact could mislead users easily.
Ex. Column C has domain {1,2,3,4,5}. A value for C is null and is used in the
expression: R.C > 6 .This value will always be false in the reality but if the value is
null the expression would return unknown.

The following list shows other two valued logic identities that are not equivalent in three
value logic. If a user is unaware 3VL acts in a different manner, he/she is likely to make
assumptions that could prove costly.
Aggregate Functions – SQL specific problem
In SQL, the aggregate functions SUM, AVG, MAX, and MIN of an empty set all return
null (“value unknown”). Date argues that this is incorrect and the functions should return
other values.

Exists Function – SQL specific problem


Because of the semantic overloading of null (it could represent a missing, not applicable,
or non existent value), it is an easy for the exists function to return an incorrect result.
Suppose the following query “Exists SOCSEC# From Employee where …”. If the
employee selected does not have a SSN the value would be null. However exists would
return true as it would interpret null as a missing value.

Not True, Not False, Not Unknown


Date argues that under three valued logic, the operator NOT is not expressively clear. X
IS NOT TRUE means that X can be false or unknown.

Date also suggests that there are many other ‘psychological’ difficulties in addition to the
ones shown in the examples. These issues make it easy for unwary users and
implementers to make mistakes and not understand the true complexity that comes with
nulls and three valued logic.
The Use of Default Values to Solve the Missing Information Problem (CH. 21)

C.J. Date’s prefers systematic the use of default values to the use of nulls/3VL. By
systematic he means the system needs to way to refer to the default values and know how
to recognize these for certain operations. Date describes the default value approach in an
ideal system where there is built in functionality to support the use of default values.
Note: Popular DBMS implementations do not represent an ideal system as they support
the use of nulls rather than default values. However, it is possible for a user to build an
additional support layer on top of the DBMS to use default values appropriately.

Definition of Default Values –


Using real values to represent missing information in the table. By using real values all
logic that will be performed on the database is the two value logic and thus we stay away
from the inconsistencies of 3VL. We interpret the default value to mean that this
information is missing.
Goal of choosing defaults values for an attribute is choosing a value that cannot represent
a true value to avoid any confusion when accessing the table.

Date argues the default value approach is the approach that most closely resembles how
the real world deals with missing information. Ex. Putting an N/A or / or – to represent
something that is either unknown or inapplicable.

Structural Aspects of Default Values –


UNK = unknown which default values will represent

• When creating tables, each column declaration must contain either an UNK clause
specifying the values or the clause will state ‘UNKS NOT ALLOWED.’ See
example below.

The UNK value specified must exist in the domain the column is specified in
(number for numerical columns, text for string columns, etc).
• If the every legal value of a domain of a column could represent a real value, the
user may have to create a separate user controlled indicator columns to specify
whether the value is unknown (such cases should be very rare)
• Whenever a new row is inserted into a table, the user must supply a value for
every column that doesn’t allow unknowns.
• If a new column is added, the new column must have an UNK clause and all
existing rows automatically get set to the default value.
Integrity Aspects of Default Values –
Entity Integrity
• The entity integrity rule of “primary key values in base relations cannot be wholly
or partly null” no longer applies with default values. Since no columns can be null
this rule no longer applies.
• By elimination the entity integrity rule, default values also eliminate the property
of the 3VL relational model that there is a distinction between base relations and
other relations.
Referential Integrity
• The referential integrity rule can be simplified and now states that “every value of
a given foreign key must exist as a value of the relevant target primary key.” If a
foreign key has a value of UNK, the target table must have a row with the UNK
value as its primary key.
• Without this value present there exists a possible that certain updates or deletes
operations on any row could fail if there isn’t a matching value in the target table.
Database Specific Integrity
• The UNKS NOT ALLOWED specification contains its own integrity constraint
and will reject any operation that violates its properties.
• An UNK clause can have special implications for database specific integrity

This example will need to be interpreted by the system that the attribute status can
take on a value or -1 or a value between 10 and 100. The DBMS will also contain
information that can identify a value of -1 as meaning it’s an unknown value.

Manipulative Aspects of Default Values –


If there is a systematic implementation of defaults values, the following functions
should be built in to provide functionality.
R is a range variable over some base table and C is a specific column of that table
• IS_UNK(R.C) -> Returns true if the value matches the UNK value, false
otherwise (Also IS_NOT_UNK is the opposite)
• IF_UNK(R.C , exp) -> returns the value of exp if R.C is UNK, otherwise
returns the original value R.C
• Outer Joins on tables would provide default values in place of nulls where
appropriate.

Date’s Argument as to why the default value scheme is better than 3VL
• Easier to understand and easier to implement
• Directly relates to how the real world handles missing information
• Fewer traps for an inexperience user
• Can be extended fairly easy to include other types of missing information such
as inapplicable and undefined values (with the use of additional keywords)
Codd argues against default values “on the grounds that it is unsystematic,
misrepresents the semantics, and is a significant burden on DBAs and users”
DB2 and SQL Standard Product both contain basic default value support but neither is
as advanced as the systematic support described in the notes above

You might also like