You are on page 1of 4

4.

Problems and Constraints


The problem statement may consist of questions dealing with database design, database
concepts, and Structured Query Language (SQL).
4.1 Data integrity refers to maintaining and assuring the accuracy and consistency
of data over its entire life-cycle, and is a critical aspect to the design, implementation
and usage of any system which stores, processes, or retrieves data.
Data integrity contains guidelines for data retention, specifying or guaranteeing the
length of time data can be retained in a particular database. It specifies what can be
done with data values when their validity or usefulness expires. In order to achieve data
integrity, these rules are consistently and routinely applied to all data entering the
system, and any relaxation of enforcement could cause errors in the data. Implementing
checks on the data as close as possible to the source of input (such as human data
entry), causes less erroneous data to enter the system. Strict enforcement of data
integrity rules causes the error rates to be lower, resulting in time saved troubleshooting
and tracing erroneous data and the errors it causes algorithms.
4.1.1 Types of integrity constraints
Data integrity is normally enforced in a database system by a series of integrity
constraints or rules. Three types of integrity constraints are an inherent part of the
relational data model: entity integrity, referential integrity and domain integrity:
4.1.1 Entity integrity concerns the concept of a primary key. Entity integrity is an
integrity rule which states that every table must have a primary key and that
the column or columns chosen to be the primary key should be unique and
not null. Referential integrity concerns the concept of a foreign key. The
referential integrity rule states that any foreign-key value can only be in one
of two states. The usual state of affairs is that the foreign-key value refers to
a primary key value of some table in the database. Occasionally, and this
will depend on the rules of the data owner, a foreign-key value can be null.
In this case we are explicitly saying that either there is no relationship
between the objects represented in the database or that this relationship is
unknown.
4.1.2 Domain integrity specifies that all columns in a relational database must be
declared upon a defined domain. The primary unit of data in the relational
data model is the data item. Such data items are said to be non-
decomposable or atomic. A domain is a set of values of the same type.
Domains are therefore pools of values from which actual values appearing
in the columns of a table are drawn.
4.1.3 User-defined integrity refers to a set of rules specified by a user, which do
not belong to the entity, domain and referential integrity categories.
If a database supports these features it is the responsibility of the database
to insure data integrity as well as the consistency model for the data storage
and retrieval. If a database does not support these features it is the
responsibility of the applications to ensure data integrity while the database
supports the consistency model for the data storage and retrieval.

4.2 Normalization
Normalization is the process of efficiently organizing data in a database. There are two goals
of the normalization process: eliminating redundant data (for example, storing the same data
in more than one table) and ensuring data dependencies make sense (only storing related
data in a table). Both of these are worthy goals as they reduce the amount of space a
database consumes and ensure that data is logically stored.

Database normalization is the process of organizing the attributes and tables of a relational
database to minimize data redundancy.

Normalization involves refactoring a table into smaller (and less redundant) tables but
without losing information; defining foreign keys in the old table referencing the primary
keys of the new ones. The objective is to isolate data so that additions, deletions, and
modifications of an attribute can be made in just one table and then propagated through the
rest of the database using the defined foreign keys.

4.3 Design Challenges: Skew and Join Collocation


Two series problem pose a major challenge for the database designer:
4.3.1 Data Skew
Kew is basically the data is not evenly distributed across partitions. An uneven
distribution degrades the performance of the overall execution as CPUs are sitting
idle and waiting for some other partitions to finish their job with larger volumes.

4.3.2 Collocation
A common problem occurs when tables need to join and the join columns are not
collocated on the same node. When this happens data for one table will need to be
shipped from remote nodes to join with data at the local node. This is a very
expensive process that can cripple the benefits of shared-nothing processing.
4.4 Constraints
Database constraints are user-defined structures that let you restrict the behaviours of
columns. There are five types of database constraints. They are:
4.4.1 CHECK Constraint
CHECK constraints are table-level constraint. You can only create table-level
constraints as out-of-line constraints. You typically restrict a column value with
a CHECK constraint to a set of values defined by the constraint. A CHECK constraint
doesn’t make the column mandatory, which means the default is the ANSI standard
for null able columns. A cardinality of 0:1 (optional) is the ANSI standard. You
override it by providing a NOT NULL constraint. The NOT NULL constraint makes the
column cardinality 1:1 or mandatory.

4.4.2 FOREIGN KEY Constraint


A FOREIGN KEY constraint restricts the values that are acceptable in a column or
group of columns to those values found in a listing of the column or group of columns
used to define the primary key. Dependent on the implementation, this may or may
not impose a NOT NULL column constraint on all members of the foreign key. If the
implementation make the column or set of columns mandatory, then it makes the
cardinality of columns in the FOREIGN KEY mandatory, or 1..1 to the PRIMARY
KEY. However, the default is that a FOREIGN KEY is 0..1 to the PRIMARY KEY,
which means that a row may be inserted in the table with the FOREIGN KEY column.
At least, it can be done provided that the constrained column or set of columns are
null values.

4.4.3 NOT NULL Constraint


A NOT NULL constraint restricts a column by making it mandatory. This means you
can’t insert a row in the table without providing a valid data type value for all NOT
NULL constrained columns. A mandatory column has the cardinality of 1:1.

4.4.4 PRIMARY KEY Constraint


A PRIMARY KEY constraint checks whether a column value will be unique among all
rows in a table and disallows null values. Therefore, a PRIMARY KEY has the
behaviours of both NOT NULL and UNIQUE constraints. A PRIMARY KEY may span
two or more columns. A multiple column PRIMARY KEY is known as a composite or
compound key, which can be confusing but essentially the terms only mean that the
key spans columns.
4.4.5 UNIQUE Constraint
A UNIQUE constraint checks whether a column value will be unique among all rows
in a table.

Reference
http://www.ibm.com/developerworks/library/co-tipld3/index.html
http://en.wikipedia.org/wiki/Data_integrity
http://databases.about.com/od/specificproducts/a/normalization.htm
http://en.wikipedia.org/wiki/Database_normalization
https://books.google.com.my/

You might also like