You are on page 1of 25

08 Databases

Candidates should be able to:


Show understanding of the limitations of using a file-based approach for the storage and
retrieval of data

Describe the features of a relational database that address the limitations of a file-based
approach

Show understanding of and use the terminology associated with a relational database
model

Use an entity-relationship (E-R) diagram to document a database design

Show understanding of the normalization process

Explain why a given set of database tables are, or are not, in 3NF
Produce a normalized database design for a description of a database, a given set of
data, or a given set of tables

Show understanding of the features provided by a Database Management System


(DBMS) that address the issues of a file based approach

Show understanding of how software tools found within a DBMS are used in practice

Show understanding that the DBMS carries out all creation/modification of the database
structure using its Data Definition Language (DDL)

Show understanding that the DBMS carries out all queries and maintenance of data using
its DML

Show understanding that the industry standard for both DDL and DML is Structured
Query Language (SQL)

Understand given SQL (DDL) statements and be able to write simple SQL (DDL)
statements using a sub-set of statements

Definitions
Field Collection of items of data

An organized collection of structured information, or data


Database
typically stored electronically in a computer system

A database in which the data items are linked by internal


Relational Database
pointers

Entity Anything that can have data stored about it

A group of similar data, in a database, with rows for each


Table
instance of an entity and columns for each attribute
Record A row in a table in a database

Field A column in a table in a database

Sometimes called a logical record or an occurrence of an


Tuple
entity

Attribute Individual data item stored for an entity

An attribute or smallest set of attributes in a table where each


Candidate Key
tuple is unique

A unique identifier for a table, a special case of a candidate


Primary Key
key

Secondary Key A candidate key that is an alternative to the primary key

A set of attributes in one table that refer to the primary key in


Foreign Key
another table

It is maintained when the database must not contain any


Referential Integrity values of a foreign key that are not matched to the
corresponding primary key

Indexing A pointer to a data in a table

It is formed when one table in a database has a foreign key


Relationship
that refers to a primary key in another table in the database

Entity-Relationship (E- A way to visually represent the relationship between the


R) Diagram entities in a table

A process used to construct a relational database that has


Normalization Process
integrity and in which data redundancy is reduced
Entities in a database do not contain repeated groups of
First Normal Form (1NF)
attributes

Second Normal
There are no partial dependencies (and are in 1NF)
Form(2NF)

Third Normal Form All non-key attributes are independent, and there are no non-
(3NF) key dependencies (and are in 2NF)

A situation in which a non-key attribute of a table depends on


Partial Dependency
only a part of the primary key

A situation in which there is at least one field that is dependent


Non-key dependency
on other fields that are not part of the key

Data Redundancy The same data is stored in multiple places

Updating data in one place does not update data stored in


Data Inconsistency
some other place

Changing structure of data has major effect on the


Data Dependency
applications using the database

Data Modelling A tool used to how the data structure of a database

A data model for a specific database that is independent of


Logical Schema
the DBMS used to build the database

Interface provided by DBMS that allows a developer to write


Developer Interface
queries in Structured Programming Language (SQL)

Query Processor A processor that processes queries written in SQL


Data Definition A language used to create, modify, and remove the data
Language (DDL) structures that form a relational database

Data Manipulation A language used to add, modify, delete, and retrieve the data
Language (DML) stored in a relational database

8.1 Database Concepts


8.1.1 The Limitations of a File-Based Approach
Storage space is wasted when data items are duplicated by the separate applications and
some data is redundant
Inconsistent as data can be altered by one application and not by another

Enquiries available can depend on the structure of data and software used so data is not
independent

8.1.2 The Advantages of a Relational Database over a File-


Based Approach
Database
An organized collection of structured information, or data, typically stored electronically in
a computer system

Advantages

Storage space is not wasted since data item are only stored once, hence minimal
redundant data
Data altered in one application is available in other applications, hence data is consistent
Enquiries available do not depend on the structure of data and software used so data is
independent

8.1.3 Relational Database Model Terminology


Database: An organized collection of structured information, or data typically stored
electronically in a computer system
Relational Database: A database in which the data items are linked by internal pointers

Entity: Anything that can have data stored about it

Table: A group of similar data, in a database, with rows for each instance of an entity and
columns for each attribute
Record: A row in a table in a database

Field: A column in a table in a database

Tuple: Sometimes called a logical record or an occurrence of an entity

Attribute: Individual data item stored for an entity

Types of keys:

Candidate Key: An attribute or smallest set of attributes in a table where each tuple is
unique

Primary Key: A unique identifier for a table, a special case of a candidate key

Secondary Key: A candidate key that is an alternative to the primary key

Foreign Key: A set of attributes in one table that refer to the primary key in another table

Relationships
Formed when one table in a database has a foreign key that refers to a primary key in
another table in the database

one-to-one
one-to-many

many-to-one

many-to-many

Referential Integrity: It is maintained when the database must not contain any values of a
foreign key that are not matched to the corresponding primary key
Indexing: A pointer to a data in a table

8.1.4 Entity-Relationship (E-R) Diagrams


Used to document the design of a database
Provides visual representation of how the entities in a database are related

The cardinality of relationships are:


Example of an E-R Diagram:
8.1.5 The Normalization Process
Used to construct a relational database that has integrity and in which data redundancy is
reduced
Tables that are not normalized are generally larger

For this section, assume the following table (STUDENT) to be used for reference:
This table is currently not normalized and hence can cause problems.

First Normal Form (1NF)


Entities do not contain repeated group of attributes
In this case, subjects and subject teachers has multiple entries for the same entry
To fix a separate table (STUDENTSUBJECT) could be made

1. Here, the primary key for this table is a composite key from the two attributes
StudentID and SubjectName since that tuple is always unique for each entry in this
table
2. StudentID is also a foreign key linking to the original table
The original table gets shortened to
Second Normal Form (2NF)
Entities are in 1NF and any non-key attributes depend upon the primary key. There are no
partial dependencies
In the STUDENTSUBJECT table, Subject Teacher is only dependent on the Subject Name
part of the primary key, a partial dependence that needs to be removed by introducing a
third table, SUBJECT

1. Hence, the STUDENTSUBJECT now has the Subject Teacher field removed

Third Normal Form (3NF)


Entities are in 2NF and all non-key attributes are independent. The table contains no non-
key dependencies
In STUDENT table, Location and Teacher Name depends upon the attribute Class ID,
while a few other attributes depend upon Teacher Name.
Since Teacher Name could be same for different teachers, Licence Number should be
used as the primary key for TEACHER table.

The updated tables are:

STUDENT
TEACHER

CLASS

SUBJECT

STUDENTSUBJECT
Skill Check 1
Bobby and Kim are discussing databases.

1. Bobby tells Kim that a file-based approach is usually better than a relational
database.
Explain why Bobby is incorrect. [3]
2. Bobby has a shop that sells products to customers his database will store data
about his customers, their payment details, orders and the products he sells.
Customers will have login details to access their accounts. The database will
update customers’ payment and login details without keeping any historical
records.

a. Give one example of each of the following relationships from Bobby’s


database.
i. one-to-one

ii. one-to-many
iii. many-to-many
b. From the above three, identify the relationship that cannot be directly
implemented in a normalized relational database.

Solution


1. Flat-file has more data redundancy than a relational database. Secondly, it
also has more data inconsistency and it is not easy to perform complex
searches

e6ec3a10a169432d96f923cfed0e55363b6ee2c027e64e16a902f3ba4a24

a. Customer to payment details


Customer to order
Customer to product

b. many-to-many

8.2 Database Management Systems (DBMS)


8.2.1 How a DBMS Addresses the Limitations of a File-Based
Approach
Data Redundancy Issue

Duplication of data reduced by storing data in separate linked tables, as most data is
stored only once

Items of data used to link tables by the use of foreign keys are stored more than once
DBMS flags any possible errors when any attempt is made to accidentally delete this type
of item

Data Inconsistency Issue


Since data is stored only once, updating once allows the updating items to be seen by all
applications
Integrity of data improved by increasing consistency
Easier to maintain since data needs be updated only once

Data Dependency Issue

Data is independent of the applications using the database, so changes to structure of


data has minimal effect on the applications using the database
Any changes to unused fields/tables does not affect the application since it only has
access to fields/tables it requires

The DBMS Approach


Uses a more structured approach to the management, organization, and maintenance of
data in a database
An already defined data structured can be used to set up and create the database
Uses a data dictionary to store the metadata, including the definition of tables, attributes,
relationship between tables and any indexing

Can also define the validation rules used for entry of data
Contains data about the physical storage of the data
Improve the integrity of data stored, helping to ensure that it is accurate, complete
and consistent
Data Modeling (For example, E-R diagram)

To show the data structure of a database


Logical Schema
A data model for a specific database that is independent of the DBMS used to build
the database
Provides data security to prevent unwanted alteration, corruption, deletion, or sharing
data with others that has no access
Using usernames, passwords to prevent unauthorized access
Encryption of the data stored
Using access rights to allow for different level of access
Creation of an activity log to record any actions taken by users of the database
Scheduling regular back-ups

8.2.2 The Use and Purpose of DBMS Software Tools


Developer Interface
Allows a developer to write queries in SQL (Structured Query Language) rather than using
query-by-example
Queries processed and executed by query processor

This allows construction of more complex queries to interrogate the database

Query Processor
Processes a query written in SQL

Includes a DDL interpreter, a DML compiler and a query evaluation engine


DDL statements are interpreted and recorded in the database’s data dictionary
DML statements are compiled into low level instructions that are executed by the
query evaluation engine
DML compiler will optimize the query

Skill Check 2
Moheem is creating a relational database to store data about his customers.
1. Moheem has been told a relational database addresses some of the limitation of
a file-based approach by reducing data redundancy.
a. State what is meant by the term data redundancy. [1]
b. Explain how a relational database can help to reduce data redundancy. [3]
2. Moheem uses a Database Management System (DBMS) to ensure the security
and integrity of the data.

a. Explain the difference between security and integrity. [2]


b. Name and describe two security features provided by a DBMS.
3. The DBMS provides software tools for the database developer.
Fill in the names of the missing software tools in the following statements.
A _________ allows a developer to extract data from a database.
A _________ enables a developer to create user-friendly forms and reports. [2]

Solution

719bc52d4f2744a583349f8fc7fd4fd204bb497d29224bfa9a719fcaf2fc87a

a. Repeated data
b. Firstly, relational database stores each record once and references it by
a primary. Secondly, it stores the data in individual tables and the tables
are linked by relationships. And lastly, by the proper use of Primary and
Foreign keys

59c354f9b3a940bd94f7b7d023a911abea9c8cd35cc9423abccfa802b778

a. Security ensures that data is safe from loss whereas integrity ensures
that the data is consistent

59fce48e69b34f378723c600777122e30af99788f16b478faa6111fd31

i. Automatic Backup: It creates regular copies of data in case of loss


ii. Encryption: Data is incomprehensible to unauthorized users
1. Query Processor, Developer Interface

8.3 Data Definition Language (DDL) and Data


Manipulation Language (DML)
8.3.1 Industry Standard Methods for Building and Modifying
a Database
DDL is a language to create, modify, and remove the data structures that form a relational
database.
Written in a script that uses syntax similar to a computer program
DML is a language to add, modify, delete, and retrieve the data stored in a relational
database.
Written in a script that is similar to a computer program
Therefore, DDL is used to work relational database structures while DML is used to work
with the data stored in the relational databases
SQL is used for these two languages

8.3.2 SQL (DDL) Commands and Scripts


Data types for attributes Description

CHARACTER Fixed length text

VARCHAR(n) Variable length text


True or False, SQL uses the integers
BOOLEAN
1 and 0

INTEGER Whole Number

REAL Number with decimal places

A date usually formatted as YYYY-


DATE
MM-DD

A time usually formatted as


TIME
HH:MM:SS

To create a database
CREATE DATABASE
To create a table
CREATE TABLE
To edit structure of table

ALTER TABLE
To add primary key to a table
PRIMARY KEY
To add foreign key to a table
FOREIGN … REFERENCES …
Here’s a complete example of a DDL script used when the school database was created
8.3.3 SQL (DML) Commands and Scripts
The following are SQL (DML) Query commands:

To fetch data from database. Queries always begin with SELECT


SELECT FROM
To include only rows in a query that match a given condition
WHERE
To sort the results from a query by a given column either alphabetically or numerically

ORDER BY
To arrange data into groups
GROUP BY
Combine rows from different different tables if the join condition is true
INNER JOIN

Returns sum of all values in a column


SUM
Counts the number of rows where the column is not NUL
COUNT
Returns the average value for a column with a numeric data type

AVG

The following are SQL (DML) maintenance commands

To add new rows to a table


INSERT INTO
To remove rows from a table

DELETE FROM
To edit rows in a table
UPDATE

Some examples of DML commands for querying and updating the above school database
are:
To show the first and second name of students in class 7A in alphabetical order based on
their second name

Revision Guides / AS / COMPUTER SCIENCE / 08 Databases

To show the teacher’s name and the subject taught

To insert a row in Student table

If values for all columns are not known, then the table columns need to be specified
before the values are inserted

To delete specified rows from Student table

Just writing the first line would delete the whole table

To find the sum of exam marks from the table STUDENTSUBJECT


Skill Check 3
Example data from the table RENTAL are given:

RentalI Customer HouseI MonthlyC DepositPai


D ID D ost d

1 22 15B5L 1000.00 Yes

2 13 3F 687.00 No

3 1 12AB 550.00 Yes

4 3 37 444.50 Yes

1. Complete the following Data Definition Language (DDL) statement to define


the table RENTAL.

CREATE ___________ ______________ (


RentalID INTEGER NOT NULL,
CustomerID INTEGER NOT NULL,
HouseID ___________ (5) NOT NULL,
MonthlyCost ____________ NOT NULL,
DepositPaid BOOLEAN NOT NULL,
______________ (RentalID)
);
2. Write a Data Manipulation Language (DML) script to return the first name
and last name of all customers who have not paid their deposit.
Solution

Points to Note
A database offers improved methods for ensuring data integrity compared to a file-based
approach
A relational database comprises tables of a special type; each table has a primary key
and may contain foreign keys

Entity-relationship modelling is a top-down approach to database design


Normalization is a database design method that starts with a collection of attributes and
converts them into first normal form then into second normal form and, finally, into third
normal form
A database architecture provides, for the user, a conceptual level interface to the stored
data

Features provided by a database management system (DBMS) include: a data dictionary,


indexing capability, control of user access rights and backup procedures
Structured Query Language (SQL) includes Data Definition Language (DDL) commands
for establishing a database and Data Manipulation Language (DML) commands for
creating queries

You might also like