You are on page 1of 41

ST U D Y U N I T 2

The database environment

The database provides the information to meet the operational need as well as future
planning for any organisation, which should all be done within the framework of
organisational procedures and policies. Procedures are business rules and instructions
that govern the design and use of the database systems, enforce the standards,
monitor and audit the data that resides in the databases and regulate the information
that is generated from the stored data. Finally, the data, which is vital to the health of
the organisation, plays a critical role in the design of the database. The existence of the
database system depends on organisational structure and requirements at each level.
The complexity depends on the size of the organisation and its functions and corporate
culture. Elomari, Maizate and Hassouni (2016), posit that as data volumes to be
processed in all domains – scientific, professional, social, amongst others – are
increasing at a high speed, their management and storage raises more and more
challenges. The emergence of highly scalable infrastructure has contributed to the
evolution of storage management technologies. However, numerous problems have
emerged such as consistency and availability of data, scalability of environments and
competitive access to data.

2.1 Introduction

In the previous study unit, we learnt that the database environment provided the
solution to the need of users to be able to share and view the same data and
information in an organisation. Databases became the preferred method of storing
data and information because of their numerous advantages. In this study unit, we will
gain an understanding of the database environment, the advantages and
disadvantages of using a database environment and an understanding of the database
environment’s components. We will also explain database models, distributed
databases and the terminology used in a relational database. In conclusion, we will
look at the factors to consider when choosing appropriate database management
software and a database.

The learning outcomes of this study unit are as follows:

1
• Assess database management systems based on an understanding of the
operating environment by
o naming and describing each of the elements of the database
environment
o listing the functions of a database management system
o defining and describing each of the components of a database
management system
o differentiating between various database models
o defining relational database terminology and identifying each item on a
simple database representation
o listing the advantages and disadvantages of using a database
environment for data storage and processing
o identifying factors to consider when choosing an appropriate database
management system and database
The following icons are included in this study unit:

2.2 Understanding a database environment

A database environment can be defined as a collective system of components that


comprise and regulate the group of data, management, and use of data, which consists
of software, hardware, people, techniques of handling the database, including the data.
In a database environment, data and information can be effectively stored and
retrieved.

2.2.1 Database components in the database environment

3
The database environment and its related components are explained in detail below.

The database environment consists of three components, namely the


users of the database, database management systems (DBMS) and the
physical database, which includes both the hardware it runs on and the
data in the physical database (see figure 2.1).

FIGURE 2.1: Database environment (UNISA 2022)

2.2.2 Advantages and disadvantages of using a database environment


Using a database environment to store, process and retrieve data and information
has certain advantages and disadvantages. We will briefly look at some of these.
2.2.2.1 Advantages
(i) Reduce data redundancy. This is because all data is stored in only one place.
(ii) Reduced costs for data entry and data storage. Data is entered only once,
which creates a data capturing cost saving. The effect of reduced data redundancy
is that less storage space is needed on storage devices, creating a data storage
cost saving.
(iii) Data integrity is maintained and improved. This is because changes and
updates to data occur in one place.
(iv) Improved data and information security. Since access to the database and
therefore access to the data and information in the database is centrally controlled
and managed, this increases security for the organisation’s data and information.
(v) Application software independence. Data is stored separately from the
application software. Changes to the database will therefore not always require
an automatic rewriting or updating of the application software, or vice versa.
(vi) Standardisation of data structures, data access, system software and file
formats.
This makes it easier to maintain data files. The standardisation creates a consistent
structure on which all application software are based, which makes it easier to update
existing or develop new application software.
(vii) Improved data access. Data can be made available to different users at the
same time as users share the data in the database. Database management
software also makes it easy for users to access and retrieve data.

2.2.2.2 Disadvantages
(i) Start-up and operating costs. It can be expensive to acquire the hardware
and software needed to set up a database environment. Furthermore, an
organisation will need to hire additional employees, such as a database
administrator, to manage the database environment.
(ii) Database systems are complex to design and use.
(iii) Because databases are complex, it is very time-consuming to design a
proper database.
(iv) Database or database management software failure will affect all
application software linked to that specific database. This can make recovery from
such a failure more difficult. A failure can shut down a whole organisation or
department(s) in the organisation, making the organisation unable to run its daily
operations and provide adequate customer service (UNISA 2022).

2.2.3 Database users

There are various database users in the database environment, including the following:

a. Database administrator (DBA)

The database administrator is responsible for managing and controlling


the organisation’s databases (UNISA 2022).

The database administrator’s role includes functions such as the following:

– Implement and maintain database management standards and conventions.


– Ensure applications software complies with database management standards
and conventions by establishing programming standards.
– Define the database structures.
– Design and create databases in line with database management standards and
conventions.

5
– Implement, maintain and evaluate database access policies and security controls.
– Monitor data and database security and access.

b. Data administrator

The data administrator, also called a database analyst, is responsible


for managing and controlling the data in the organisation’s databases
(UNISA 2022).

One of the data administrator’s responsibilities is to manage the integrity of the data in
the database by setting and enforcing the data standards and data definitions to be used
in all the organisation’s databases. In many organisations, the functions of the data
administrator and database administrator are combined – hence these functions are
performed by the same person.

c. End-users

End-users capture data in the database and extract information from the
database using database management system software (UNISA 2022).

Owing to their computer skill level, most end-users will interact with the database
management system (DBMS) through application software. DBMS is explained in
section 2.3.

d. Application programmers

Application programmers are responsible for creating, maintaining, updating


and managing the application and DBMS software that the end-users use to
interact with the physical database (UNISA 2022).

2.3. Database management systems (DBMS)

2.3.1 DBMS functions


DBMS enables users to

(i) design, create and maintain the database structure and the
database
(ii) control the organisation, storage and retrieval of data in the
database
(iii) capture, maintain (delete, insert and amend) and manipulate the
data in the database
(iv) share data between multiple users simultaneously
(v) execute queries and generate output
(vi) control the movement of the data between authorised users and
the database
(vii) control and monitor access to the database
(viii) analyse and monitor database performance (UNISA 2022)

2.3.2 Types of databases

2.3.2.1 Self-Driving Database Management System (SDDMS)

According to Pavlo, Angulo, Arulraj, Lin, Lin, Ma, Menon, Mowry, Perron, Quah and
Santurkar (2017), using existing automated tuning tools is an onerous task, as they
require laborious preparation of workload samples, spare hardware to test proposed
updates, and above all else, intuition into the DBMS’s internals. They argue that if the
DBMS could do these things automatically, then it would remove many of the
complications and costs involved with deploying a database.

In the last two decades, both researchers and vendors have built advisory tools to
assist database administrators (DBAs) in various aspects of system tuning and physical
design. The database landscape, however, has changed significantly in the last decade
and one cannot assume that a DBMS is deployed by an expert that understands the
intricacies of database optimisation. But even if these tools were automated such that
they could deploy the optimisations on their own, existing DBMS architectures are not
designed to support major changes without stressing the system further, nor are they
able to adapt in anticipation of future bottlenecks. Peloton, the first self-driving DBMS
with autonomic capabilities, now makes it possible due to algorithmic advancements in
deep learning as well as improvements in hardware and adaptive database
architectures. Most of this previous work, however, is incomplete because they still
require human intervention to make the final decisions about any changes to the
database. These are reactionary measures that fix problems after they occur. In this
regard they further proposed that what is needed for a truly “self-driving” DBMS is a
new architecture that is designed for autonomous operation. This is different than earlier
attempts because all aspects of the system are controlled by an integrated planning
component that not only optimises the system for the current workload, but also predicts
future workload trends so that the system can prepare itself accordingly.

With this, the DBMS can support all the previous tuning techniques without requiring a
human intervention to determine the right way and proper time to deploy them and it
also enables new optimisations that are important for modern high-performance
DBMSs. However, these are not possible today because the complexity of managing
these systems has surpassed the abilities of human experts. Thus, the idea of using a
DBMS to remove the burden of data management was one of the original selling points

7
of the relational model and declarative query languages from the 1970s. With this
approach, a developer only writes a query that specifies what data they want to access.
The DBMS then finds the most efficient way to store and retrieve data, and to safely
interleave operations.

a. Self-Driving Database Management System Process (SDDMSP)

Much of the previous work on self-tuning systems is focused on standalone tools that
target only a single aspect of the database. For example, some tools are able to choose
the best logical or physical design of a database such as indexes, partitioning schemes,
data organisation, or materialised views. Other tools are able to select the tuning
parameters for an application. Most of them operate in the same way: the DBA provides
it with a sample database and workload trace that guides a search process to find an
optimal or near-optimal configuration. All of the major DBMS vendors’ tools, including
Oracle, Microsoft, and IBM, operate in this manner. There is a recent push for
integrated components that support adaptive architectures, but these again only focus
on solving one problem (Pavlo et al, 2017).

2.3.2.2 Traditional DBMS

A DBMS is a suite of software designed to manage databases and run operations on


the data that are requested by several clients. A DBMS is expected to be able to
provide the following:
(i) Allows users to create new databases. Users define the logical structure
(database schema) of the data that will be stored using a data definition
language.
(ii) Allows users to modify and query the data using a query language or data
manipulation language.
(iii) Manages large amounts of data while keeping the efficiency of querying and
manipulations.
(iv) Supports durable storage of data. This means recovery from failures and misuse
cases.
(v) Controls access to data from multiple users with different privileges. It should
prevent unexpected interactions between users (isolation) and uncompleted
actions on data (atomicity) (UNISA 2022).

2.3.2.3 Cloud based systems

Pavlo et al (2017) provide that likewise, cloud-based systems employ dynamic resource
allocation at the service level, but do not tune individual databases. All of these are
insufficient for a completely autonomous system because they are (1) external to the
DBMS, (2) reactionary, or (3) not able to take a holistic view that considers more than
one problem at a time. That is, they observe the DBMS’s behaviour from outside of the
system and advise the DBA on how to make corrections to fix only one aspect of the
problem after it occurs. The tuning tools assume that the human operating them is
knowledgeable enough to update the DBMS during a time window when it will have the
least impact on applications.

a. Development of cloud computing paradigm

According to Cui, Yang, Wang, Geng and Li (2020), due to secure range query over
encrypted data in outsourced environments with the rapid development of cloud
computing paradigm, data owners have the opportunity to outsource their databases
and management tasks to the cloud. Due to privacy concerns, they are required to
encrypt the databases prior to outsourcing. However, there are no existing techniques
for handling range queries in a fully secure way. To efficiently process secure range
queries, the extraordinarily challenging task is how to perform fully secure range queries
over encrypted data without the cloud ever decrypting the data.

2.3.2.4 Cloud computing


Traditional DBMS with their centralised architecture, strong consistency and relational
model do not fit well in a cloud computing environment. Thus, cloud computing is a new
technology that promised to change the IT world by providing resources as an elastic
pool of services in a pay-as-you-go model, just like the electric grid freed corporations
from worrying about generating their own electricity. In this regard, cloud computing
promised to free corporations from worrying about their IT resources to focus on their
business logic. Whether it is storage space, computational power or software delivery,
corporations can get these resources over the network from one of the cloud providers
such as Amazon and Google.
To meet the storage needs of cloud applications, new data management systems and
different architectures are needed, with a variety of data partitioning schemes and
replica placement strategies, including making design decisions by analysing the
applications workloads and technical environment. New data models such as a key-
value store with its variations of row-oriented, document-oriented, and wide column are
used commonly in the cloud. However, there is not a widely accepted definition of cloud
computing, due to different reasons. One is the involvement of developers and
engineers from different fields, for example, grid computing, software engineering and
databases in cloud computing research, where each works on it from a different
perspective. Another reason is that the technologies that enable cloud computing such
as Web 2.0 and service-oriented computing, are still changing and developing.
Notwithstanding, cloud computing can be defined as a model for ubiquitous,
convenient, on-demand network access to a shared pool of computing resources
(infrastructure, applications and platform) that can be provisioned and released with
minimal management effort or service provider interaction (UNISA 2022).

9
2.3.2.5. Characteristics of cloud computing
The essential characteristics and features of cloud computing include the following:
a. It must be an on-demand self-service, which means that users of the cloud can
automatically provision resources with minimal or no human interference of the
cloud service provider.
b. Users access the cloud services via networks by deploying suitable techniques and
protocols with the use of thick or thin clients.
c. Resource pooling. It means that services of the cloud are pooled and serve many
consumers using a multi-tenant model wing: Foundations and Related Topics,
External Public Cloud, Public Hybrid Cloud, Private Cloud, Internal Enterprise: cloud
computing deployment models adapted from:

(i) Public clouds: This is the most popular deployment model of cloud
computing. In this model, the cloud infrastructure and resources are owned
by an enterprise that provide them to individuals or other enterprises in a pay-
as-you-go model. Cloud resources are shared between many consumers.
Leaders in the market who provide cloud services of this model are Google
and Amazon. They provide many options that allow users to get the
resources with minimal cost and less management effort. Major concerns are
privacy, security, and data control.
(ii) Private clouds: Cloud infrastructure operates to serve one organisation.
Management of the cloud is done by a third part or by the organisation. This
model usually attracts governments and organisations that prefer to keep
data in a private environment.
(iii) Hybrid clouds: The cloud infrastructure is a combination of private and public
clouds. Each of them will still be a single entity connected to another cloud. In
this case, enterprises can choose to store their data on the private part of the
cloud.
(iv) Community clouds: Enterprises with the same needs share the cloud
infrastructure. The cloud is managed by a third party or by the enterprises
that share it (UNISA 2022).

2.3.2.6 Vital component of cloud computing

According to Mansouri, Toosi and Buyya (2017), storage as a Service (StaaS) is a vital
component of cloud computing by offering the vision of a virtually infinite pool of storage
resources. It supports a variety of cloud-based data store classes in terms of
availability, scalability, ACID (Atomicity, Consistency, Isolation, Durability) properties,
data models, and price options. Application providers deploy these storage classes
across different cloud-based data stores not only to tackle the challenges arising from
reliance on a single cloud-based data store, but also to obtain higher availability, lower
response time, and more cost efficiency.
2.3.2.7 Adoption of Cloud
According to Shrivastava and Pateriya (2017), in this era, every person has
experienced major changes because of increased internet connectivity and mobile
phones. Thus, the exponential growth of data is a matter of concern for every
organisation, and storage of a huge data mountain is only possible through adoption of
the cloud. Nowadays, popularity of software defined system is increasing, and
virtualised cloud data centres are also moving towards software-defined data centres.
This change is possible only because of the advancement in software-defined networks
and software-defined storage, amongst others. The day-to-day generation of digital
data during internet usage is exponentially increasing from petabyte to exabyte. Big
data storage, maintenance and analyses are now the hot research area, which needs
very innovative ideas. Maintaining data availability, reliability, and security on third party
clouds are the key concern. Storage services are obstructed because of various types
of hardware or software failures, maintenance or upgrade operations on low-cost
commodity servers. To overcome all these faulty conditions and providing maximum
uptime for the data stored in the cloud, redundancy is maintained in the cloud.
Replication of high space consumption make it very expensive for storage of big data,
hence the emergence of large, distributed storage systems inclined towards usage of
erasure coding to tackle faults and minimising space consumption. Replication is a
well-known redundancy technique for creating multiple copies of the data, which has
been applied in most of the distributed storage systems.
2.3.2.8 Cloud cost of ownership and storage management
Shrivastava and Pateriya (2017) developed a framework for data management
interface for software-defined storage using well-known redundancy techniques,
replication, and erasure coding. Cloud service providers virtualised their data centres to
provide cheaper services, but still availability, reliability and fault tolerance have a
heavy impact on their earnings. This virtualisation work focused on solving the
following two issues:
(i) Reliability and cost of data storage in the cloud by continuous monitoring
(ii) Scanning of the storage system
The Erasure code utilises proper space by applying sophisticated algorithms and is
able to recover data in case of failure. It provides space optimality but has high
reconstruction costs. Triplication policy for fault tolerance has been applied for a long
time but its high storage consumption forces the use of erasure codes to provide
availability and reliability in cloud storage systems. This new SD Storage framework
makes a separation between software and hardware layers and apply replication as
well as erasure codes together. SD Storage can serve different demands by applying
various policies and can handle the ever-increasing mountain of data. The storage
controller present in SD Storage is programmable and helps in managing and
provisioning of storage resources to provide cost-effective solutions to optimise data
centre performance. Adjusting the erasure codes on the fly and combining it with
replication, controls the overall completed requests and helps in minimising access
time of files. This framework added new functionality into SDStorage controller that
automates resource provisioning, which helps in enhancing organisation efficiency and
reducing the total cost of ownership. These added features in SDStorage do not
degrading service level agreements, hence it will boost the adoption of a software-
defined cloud by various organisations. This work can be further improved by adding a
security feature.
This new framework decreases the total cost of ownership and provides and efficient
technique for storage management in the cloud, which propels the development of a
software-defined cloud.

11
a. Characteristics of software-defined systems

The main characteristics of software-defined systems are abstraction layers or


interfaces that hide the complexity and provide support for service management.

b. Characteristics of software-defined storage


The success of any organisation totally depends on how to control, distribute and store
data to increase its business value. Software-defined storage allows users to properly
communicate their storage needs and allows automated mobility and management of
data, which can reduce storage cost and enhances data reliability.
2.3.2.9 Major technological orientations
According to Elomari, Maizate and Hassouni (2016), the amount of data generated
during a single day may exceed the amount of information contained in all printed
material all over the world. They listed the following as some of the major technological
orientations existing in the market:
- Google File System (GFS)
- IBM General Parallel File System (GPFS)
- Open=source systems such as Hadoop Distributed File System (HDFS),
- Blobseer
- Andrew File System (AFS)
In their study they discussed and compared the main characteristics to understand the
needs and constraints that led to these orientations. For each case, they discussed a
set of major problems of big data storage management, and how they were addressed
in order to provide the best storage services. (You can read the study on your own to
get a full understanding of these characteristics).
2.3.3 Data synchronisation in a heterogeneous database environment

Verma, Kumar and Dixit (2016) propose that data synchronisation refers to always
having the updated dataset from every database located at every / any location and
then to transform it into the meaningful information to be used at the users’ situated at
various locations. Thus, data synchronisation is the process of establishing
consistency among data from a source to a target data storage and vice versa and the
continuous harmonisation of the data over time. In the heterogeneous database
environment, technical heterogeneity, data model heterogeneity as well as symantec
heterogeneity can be encountered. This means an organisation can have multiple
types of databases and data residing in various databases, like Oracle, MySQL,
PostgreSQL and SQL Server amongst others, located at different locations, namely
zones/states/districts, amongst others, having different structures yet storing
similar/the same information in them and vice versa. In a heterogeneous database
environment, data synchronisation is a major issue. For example, any government
initiative invariably brings with it challenges related to fetching of data from various
locations like respective zones/states/districts, including its synchronisation,
transformation, and standardisation. The need was felt to have a configurable low-cost
or open-source utility which can fetch data incrementally or completely (according to
the requirement) in regular (configurable) intervals of time with error tracking and
correction mechanisms.

2.3.4 Solutions to data synchronisation in a heterogeneous database


environment

As stated by Verma, Kumar and Dixit (2016), the following provides the solution to data
synchronisation in a heterogeneous database environment:

a) DB Links and Triggers

A database link is a pointer that defines a one-way communication path from an


Oracle database server to another database server. The link pointer is actually defined
as an entry in a data dictionary table. To access the link, you must be connected to the
local database that contains the data dictionary entry. This link connection allows local
users to access data on a remote database. For this connection to occur, each
database in the distributed system must have a unique global database name in the
network domain. The global database name uniquely identifies a database server in a
distributed system.

On the other hand, the trigger is one kind of mechanism in most database systems.
Oracle, SQL Server, DB2 support this kind of mechanism. Once database content
changes have occurred as a result of triggers, the database server can automatically
take the relevant action. These actions may include insert, delete, and update or
execute procedure. Using the trigger can compile much application sharing the SQL
sentence. A trigger's principal advantage is that when the data is revised it can carry
out the order, which which is automatically defined by this triggering procedure.

DB links and triggers were possible solutions but due to limitations related to different
kind of databases in use in various SDCs, this method cannot be used.

b) Log analysis methods

A database log is an important tool for recovery of data and maintaining the integrity of
the database. It already contains all the operating records of the information that was
submitted successfully. The log analysis method is implemented by analysing the log
information of the database log to capture changes in sequence of synchronisation
objects, because the database log already contains all the operating record that
submitted successfully. As most of the database log's format does not open, there is a
need to use dedicated log analysis tools or interface to parse the logical log of the
databases, to restore the operation that happened to SQL statements and record the
log files. Log files should contain the operation time, SQL statements, etc.

13
All DDL and DML SQL query from source database to target database may be
captured. A process runs continuously in the back end to read SQL query and passes
the SQL query to the target system through HTTP. Another process processes the
SQL query in the target system.

c) Timestamp-Based Approach
The method requires that every table of the application systems in the event has a
timestamp field to record the modification time of each table. This method does not
affect the efficiency of the original application but need to make larger adjustments to
the original system and cannot capture the operation of the data changes that are not
caused by the application itself.

d) The Method Based on Shadow


In many cases, the source database does not need to understand each operation of the
synchronised object, provided that there is an understanding of how the final total
occurs, then this is sufficient. So, during the time of initialisation, make a shadow Table
S for synchronisation object Table T and compare the current contents of T and S to
obtain the net change in the information.

e) Through Xml
XML is a SGML simplified standard edition. XML is not a kind of programming
language, but a kind of data description language, which half structures the data. XML
documents usually consist of the state, the element, the attribute, and the text
constitution

The XML correlation technique that must be used in this system mainly contains the
following: XML documents structure description, demonstration, and programming
connection technology.

f) Through JMS
The JMS: Java message service (JMS) is a group of Java application program
interfaces (Java API). It provides the foundation, the transmission, and receiving the
read message service [8]. JMS has two kind of programming models: The point-to-point
(point-to-point, P2P) model, and issues and subscribes (publish-and-subscribe,
pub/sub) mode [9]. The P2P message passing model transmits each message sent
through a queue to a receiver. The P2P model ensures that there is only one recipient,
reading each message. In the pub/sub message model, a message producer is sent to
one or more registered consumers based on the theme of a message. Consumers can
subscribe to a theme.
g) Third Party Software Like Symmetric DS
Symmetric DS was based on triggers. Every time there was any change in the schema
at any SDC, we could not get the synchronised data. Hence, in our cases we do not get
the desired results through these triggers.

h) Governance Issues

States were reluctant to share DB credentials.

i) Changes in case of DB Migration / Shifting / Change of servers

In these cases, jobs (application) needs to be re-written.

2.3.4.1 Features

(i) Cross Platform - It works as a web-service, hence can be used through any
operating system.

(ii) Multi-Threaded – Can work in parallel for all states all jobs that are running.

(iii) Automatic Recovery – Jobs in which error occurs, are tried again until it is
successful or cancelled manually.

(iv) Initial data load – Data can be fetched incrementally as well as completely as per
requirement.

(v) Central Configuration – Can be configured through single location only.

(vi) Communication Methods – Can pull changes from various states based on
configurable time intervals (automatically) through jobs as well as through manual
method as per requirement.

(vii) Monitoring – Can monitor for errors or pendency and raise alerts accordingly.

(viii) Embeddable – Can be embedded into any application without much effort.

The decentralised nature of our scientific communities and healthcare systems has
created a sea of valuable but incompatible electronic databases (Verma et al 2016).
This was materialised through a utility for fetching of data from different heterogeneous
databases, placed at different locations followed by its synchronization and
transformation of synchronised data through mapping processes. Through this utility we
are be able to successfully synchronise data after fetching it to the central location
(Central Data Center) for any program that may require the use of the data.

15
According to Kim, Kim and Chang (2016:443-446), research on secure range query
processing techniques in outsourced databases has increasingly come under the
spotlight with the development of cloud computing. The existing range query processing
schemes can preserve the data privacy and the query privacy of a user. However, they
fail to hide the data access patterns while processing a range query. So, in this paper
we propose a secure range query processing algorithm which hides data access
patterns. Our method filters unnecessary data using the encrypted index. We show
from our performance analysis that the proposed range query processing algorithm can
efficiently process a query while hiding the data access patterns.

2.4. Three-level database architecture

DBMSs are a ubiquitous and critical component of modern computing, and the result of
decades of research and development in both academia and industry (Fakhimuddin,
Khasanah & Trimiyati 2021). Historically, DBMSs were among the earliest multi-user
server systems to be developed, and thus pioneered many systems design techniques
for scalability and reliability now in use in many other contexts. While many of the
algorithms and abstractions used by a DBMS are textbook material, there has been
relatively sparse coverage in the literature of the systems design issues that make a
DBMS work.

2.4.1 Architecture of a Database System

This is an invaluable reference for database researchers and practitioners and for those
in other areas of computing interested in the systems design techniques for scalability
and reliability that originated in DBMS research and development. It presents an
architectural discussion of DBMS design principles, including process models, parallel
architecture, storage system design, transaction system implementation, query
processor and optimiser architectures, and typical shared components and utilities.
While many of the algorithms and abstractions used by a DBMS are textbook
material, Architecture of a Database System addresses the systems design issues that
make a DBMS work.

ANSI-SPARC (American National Standards Institute [ANSI] – Standards Planning


and Requirements Committee [SPARC]) suggested a three-level database architecture,
namely an external level, conceptual level and internal level (UNISA 2022). This three-
level database architecture is now commonly used in modern DBMS frameworks and is
based on the different views of data in a DBMS.

(a) External level

The external level, also called the user view, is the individual end-
user’s view of the data and the database (UNISA 2022).
Because users’ information needs differ, the views they require of the database will
also differ – hence there may be an infinite number of external views. For example, the
creditor’s clerk input screen and reports (user view) will look different from the input
screen and reports (user view) of the cashbook clerk. When working on Pastel Partner
(topic 6), we will see in practice how the user views differ, depending on the type of
transaction processed (i.e., creditors, cashbooks, etc).

(b) Conceptual level

The conceptual level is a complete view of the entire database, that


is, a view of all the data from which the user views can be derived
(UNISA 2022).

The database administrator will generally use this view. In comparison with the user
view, which may have infinite variations, there is only one conceptual view.

(c) Internal level

The internal level, also called the physical view, is the low-level view
of how the data is physically stored on a storage device such as a
magnetic hard drive disk (UNISA 2022).

There is only one physical view. The binary code (1s and 0s, e.g., 01100011) in the
database is one facet of the physical view.

The database administrator updates and maintains all three levels.

FIGURE 2.2: Three-level database architecture (UNISA 2022)

2.5. DBMS key components


The key components of the DBMS are the data dictionary and database languages.

17
(a) Data dictionary

A data dictionary is a centralised file containing detailed information


about the database and the data contained in the database (UNISA 2022).

The data dictionary is a very import tool for all database users as it ensures all users
have the same understanding of the data fields and database files. A data dictionary
will therefore assist in the accurate processing of data and make information and/or
data easier to analyse.

Amongst others the information a data dictionary contains include:

• What data is stored in the database


• For each data field in the database information such as:
o the name and description of the data field
o other names the data field may have
o a range of acceptable values
o the data type (type of data stored, i.e., numeric, alphabetic, alphanumeric,
date, etc)
o the field length (the number of characters that can be entered into the data
field)
o the software and records it is used in
o the source of the data field
o outputs in which it is used
• The names and descriptions of the database files
• For each database file a list of attributes, primary keys and foreign keys included
• The authorised user groups for the database files and/or data fields

An extract of some of the information contained in a data dictionary is provided below.

Table name Field name Description Field length Data type


Supplier master Credit limit Supplier credit limit 7 Numeric
Supplier master Balance Balance outstanding. Amount includes VAT 15 Numeric
Inventory master Inventory category The category the inventory item belongs to 20 Alphabetic
Inventory master Inventory code Inventory item unique code 6 Alphanumeric

As you have noticed, this section about the data dictionary refers to terminology you
may not be familiar with. These terms are explained in section 2.5 of this study unit.
Therefore, refer back to this section about the data dictionary after you have worked
through section 2.10.
(b) Database languages

The database users (end-users, application programmers and


database administrator) use different database languages to interact
with the database (UNISA 2022).

These languages included a data definition language, data control language,


data manipulation language and a data query language. These database languages
are usually specific to the database model in use. (We will learn about these
different types of database models in section 2.8.) SQL (structured query
language) is one for the most used database languages for a relational database
model and combines the data definition language, data control language, data
manipulation language and data query language.

• Data definition language (DDL)

As the name implies, data definition language is used to define a


database and includes commands to (1) create, modify and delete
the database and database objects, (2) define and describe the data
structure of the database according to the database model used, and
(3) create the data dictionary (UNISA 2022).

Database objects include database tables, views, rules, indexes and so forth. DDL is
usually only available for use by the database administrator and requires detailed
knowledge of the conceptual level of the DBMS.

• Data control language (DCL)

DCL controls the security and user access to the database objects
and data in the database (UNISA 2022).

DCL is usually only available for use by the database administrator.

• Data manipulation language (DML)

DML is used in the routine operation of the database to insert,


delete, modify and maintain the data stored in the database
(UNISA 2022).

Data manipulation language can be used by all the database users, but the level of use
will be determined by their skill level and access granted. Most end-users, however,

19
access the DML through application software. DML and DDL should not be confused.
DML is used for the data stored in the database, while DDL is used on the database
objects and structure.

• Data query language

Data query language is used to retrieve data from the database (UNISA 2022).

All database users can use data query language. However, owing to their
programming skill level, most end-users access the data query language through
application software.

2.6 Physical database

A database can be defined as an organised collection of related data that is managed


and stored electronically and can provide data to different application software in the
organisation. A database is used to save and retrieve data (UNISA 2022).

Using a DBMS, different application software and users in an organisation can access
the same data and a variety of other data in the database.

2.7 Data models

A data model is a model that describes in an abstract way how data is represented in
an information system or in a DBMS. Choosing the data model has a fundamental
effect on the other aspects of a database system like the integrity constrains, and data
access.
The most used data model is RM, which was developed for classic database
applications such as banking systems, airlines reservations, and sales/customers
relations. It was implemented by major DBMS like Oracle, IBM DB2, MS SQL,
PostgreSQL, amongst others. In this model, data is organised in tables (relations) of
records (tuples) with columns (attributes). A table can have a primary key, which is the
unique identifier of rows. A primary key can be referenced from another table as a
foreign key and forces integrity constrains on the data (UNISA 2022). Databases are
the literal backbone of a client’s lifestyle or a business’s worth because of the social
value that each individual has assigned to them. The core of the functionality that they
provide to users is the design of various types of database models.

2.8. What is a database model?

A database model is a type of data model that defines a database’s logical structure. It
determines how data can be stored, organised, and manipulated. The relational model,
which uses a table-based format, is the most common database model. It demonstrates
how data is organised and the various types of relationships that exist between
them. The facts that can enter the database, or those of interest to potential end-users,
are specified by a database schema, which is based on the database administrator’s
knowledge of possible applications. In predicate calculus, the concept of database
schema is analogous to the concept of theory. A database, which can be seen as a
mathematical object at any point in time, closely resembles a model of this “theory.” As
a result, a schema can contain formulas that represent both application-specific
integrity constraints and database-specific integrity constraints, all expressed in the
same database language. Databases can be classified according to the theoretical
data structure, referred to as a data model, on which it is based. The data model used
will determine the manner in which the data is stored and organised, and the operations
that can be performed on the database. The following types of database models have
distinct appearances and operations and can be used in different ways, depending on
the needs of the user. (Mohammad & Schallen 2011; Balasankula 2022).

2.9. Types of database models

A number of database models can be used. However, we will only extensively discuss
some of the main model types, namely hierarchical, network, relational, object-oriented
and multidimensional. Others will be briefly introduced.

2.9.1 Hierarchical model

The hierarchical model was used in early databases and, as the name
indicates, the data is structured in a hierarchical (upside down tree-like)
structure (UNISA 2022).

It is one of IBM’s first types of database models for information management.

Nowadays, these types of database models are uncommon. It has nodes for records
and branches for fields. A hierarchical database is exemplified by the Windows registry
in Windows XP, whose configuration options are saved as node-based tree structures.
The “parent-child” relationship is used to store data in this type of database.

The relationship between the data records is based on a one-to-many relationship,


also known as a parent/child relationship (a child can have only one parent, but a
parent can have many children). This type of data structure is inflexible. Microsoft
Windows Explorer is structured hierarchically, and unless we duplicate a file, a file can
only be saved in one directory. See figure 2.3 for a visual representation.

21
FIGURE 2.3: Hierarchical model (UNISA 2022)

2.9.1.1 Advantages

• The model facilitates the addition and deletion of new data.


• Data at the top of the hierarchy can be accessed quickly.
• It was compatible with linear data storage media like tapes. The hierarchical
database was well-suited to the tape storage systems used by mainframes in the
1970s, and it was widely used in organisations with databases based on those
systems.
• It applies to anything that relies on one-to-many relationships. For example, a CEO
may have many managers reporting to them, and those managers may report to
many employees, but each employee has only one manager.

2.9.1.2 Disadvantages

• It necessitates the regular storage of data in multiple entities.


• Linear data storage mediums, such as tapes, are no longer used today.
• When looking for data, the DBMS must go through the entire model from top to
bottom until the required information is found, which makes queries extremely slow.
• Only one-to-many relationships are supported by this model; many-to-many
relationships are not. (UNISA 2022)

2.9.2 Network model

The network model supports many-to-many relationships, that is, data may be
accessed by following several paths (UNISA 2022).

In many-to-many relationships, a child can have multiple parents and there can be
relationships between children. Nowadays, the use of this model is mostly obsolete.
The Database Task Group formalised this model in the 1960s, and the hierarchical
model is generalised in this model. It can have multiple parent segments, which are
grouped into levels, but there is a logical relationship between the segments that belong
to each level. Typically, any of the two segments have a many-to-many logical
relationship.

Because it resembles a hierarchical database model, it is frequently referred to as a


modified version of a hierarchical database. The network database model organises
data in a graph-like fashion and allows for multiple parent nodes.

The network models are the types of database models that are designed to represent
objects and their relationships flexibly. The network model extends the hierarchical
model by allowing many-to-many relationships between linked records, which implies
multiple parent records.

The types of database models are built using sets of related records and are based on
mathematical set theory. Each set contains one owner or parent record as well as one
or more child or member records. This model can convey complex relationships
because a record can be a member or child in multiple sets.

After being formally defined by the Conference on Data Systems Languages in the
1970s, it became extremely popular (CODASYL).

2.9.2.1 Advantages

• The network model is conceptually simple to implement.


• The network model can better represent data redundancy than the hierarchical
model.
• The network model can handle one-to-many and many-to-many relationships,
which is extremely useful in simulating real-world scenarios such as the network
model for a finance department, restaurant chain workflow, etc.
• The network model is better than the hierarchical model at isolating programs
from complex physical storage details. The network model allows each record to
have multiple parent and child records, forming a generalised graph structure,
whereas the hierarchical database model structures data as a tree of records,
with each record having one parent record and many children.

2.9.2.2 Disadvantages

• Because all records are maintained using pointers, the database structure
becomes extremely complicated.
• Any record’s insertion, deletion, and updating operations necessitate numerous
pointer adjustments.
• Changing the database’s structure is extremely difficult. (UNISA 2022)

23
2.9.3 Relational model

In a relational model, data is stored in two-dimensional


rows and columns (i.e., tables). (UNISA 2022)

A table is also known as a relation, and each database has several tables. Every table
has its own primary key and the database uses this to link (relate) the table to the other
tables in the database (primary key is explained in section 2.9.3.2). A table is similar to
a spreadsheet with rows and columns. (Spreadsheets are discussed in topic 2).
MySQL, Microsoft SQL Server and Oracle are examples of relational model
databases.

A relational database management system (RDBMS) refers to the various software


systems used to maintain relational databases. The data in this type of database model
is organised in two-dimensional tables with rows and columns, and the relationship is
maintained by storing a common field. There are three main parts to it.

The following three key terms, relations, attributes, and domains are frequently used in
relational models.

• Relations: This refers to a table with rows and columns.


• Attributes: The defining characteristics or properties in a relational database
that define all items belonging to a particular category and are applied to all cells
in a column.
• Domain: The set of values that the attributes can take

2.9.3.1 Parameters in the relational model

• Tuple: A tuple is a single row in a table.


• Cardinality of a relation: The cardinality of a relationship is determined by the
number of tuples in it. The relation has a cardinality of 4 in this case.
• Degree of a relation: Each tuple column is referred to as an attribute. The
degree of a relationship is determined by the number of attributes in it.

2.9.3.2 Keys of a relation

• Primary Key: It is the identifier that makes a table unique. There are no null
values in it.
• Foreign Key: It refers to another table’s primary key. Only values that appear in
the primary key of the table to which it refers are allowed.

Examples
• Oracle: The Oracle Database is also known as Oracle RDBMS or simply Oracle.
Oracle Corporation produces and markets a multi-model database management
system. An Oracle database is a logical collection of data. It’s the first database
built specifically for enterprise grid computing, the most flexible and cost-
effective way to manage data and applications.
• MySQL: MySQL is a Relational Database Management System (RDBMS) based
on Structured Query Language that is free to use (SQL). MySQL is available on
almost every platform, including Linux, UNIX, and Windows.
• Microsoft SQL Server: In corporate IT environments, Microsoft SQL Server is
an RDBMS that supports a wide range of transaction processing, business
intelligence, and analytics applications.
• PostgreSQL: PostgreSQL, or simply Postgres, is an object-Relational Database
management system (ORDBMS) that focuses on extensibility and compliance
with industry standards.
• DB2: DB2 is an IBM database product. It’s a database management system for
relational databases (RDBMS) that is optimised for data storage, analysis, and
retrieval. With XML, the DB2 product now supports object-oriented features and
non-relational structures.

Owing to its many advantages, the relational model is the most commonly used
database model for business and financial databases. Relational database
terminology will be discussed in section 2.11.

2.9.3 3 Advantages

Here are a few key advantages of relational database models:

• Data can be accessed, inserted and/or deleted without changing the database
structure.
• The database structure can be easily customised for most types of data storage.
• Data does not need to be duplicated.
• Most users easily understand the structure.
• It is easy to search for and extract data from the database.
• Changes in the database structure have no impact on data access in the
relational model.
• Revising any information as tables with rows and columns makes it much easier
to comprehend.
• Unlike other models, the relational database model supports both data
independence and structure independence, making database design,
maintenance, administration, and usage much easier.
• You can use this to write complex queries to access or modify database data.
• In comparison to other models, it is easier to maintain security.

2.9.3.4 Disadvantages
• A disadvantage of using this model type is that it is slower than the network and
hierarchical models because it uses more processing power to query data.

25
• It’s difficult to map objects in a relational database.
• The relational model lacks an object-oriented paradigm.
• With relational databases, maintaining data integrity is difficult.
• The relational model is suitable for small databases but not for large databases
because they are not designed for change. Each row represents a unique entry,
and each column describes unique attributes, in relational databases. Data
modelling requires planning ahead of time and, depending on the system, can
take months or even years. After-the-fact changes take time and resources, and
database modelling projects can take years and cost millions of dollars. Because
big data is always changing, a flexible and forgiving database platform is
required.
• Hardware costs are incurred, making it expensive.
• The relational data model is not appropriate for all domains. Schema evolution is
difficult due to an inflexible data model. Poor horisontal scalability results in low
distributed availability. Due to joins, ACID transactions, and strict consistency
constraints, performance has suffered (especially in distributed environments).

The implementation complexities and physical data storage details of a relational


database system are hidden from users (UNISA 2022).

2.9.4 Object-oriented model

In an object-oriented model, the data and the operations to be performed on the


data are both stored in the database. This database model can furthermore store
and process a wider range of data types than only text and numerical data – it also
stores and processes images, audio and video data.

In object-oriented programming, an object database is a system in which data is


represented as objects. Relational databases, which are table-oriented, are not the
same as object-oriented databases. The object-oriented data model is one of the types
of database models that is based on the widely used concept of object-oriented
programming languages (UNISA 2022).

This model is used for more specialised databases such as multimedia web-
based applications, molecular biology databases and defence industries. Object-oriented
database models are not as widely used as relational databases because they are
expensive to implement, and many organisations do not need to process data types
other than numerical and text data.

2.9.4.1 Advantages

• Object databases can store a variety of data types, whereas relational databases
store only one type of data. Object-oriented databases, unlike traditional
databases such as hierarchical, network, and relational databases, can handle a
variety of data types, including pictures, voice, video, text, and numbers.
• You can reuse code, model real-world scenarios, and improve reliability and
flexibility with object-oriented databases.
• Because most of the tasks within the system are encapsulated, they can be
reused and incorporated into new tasks.
• Object-oriented databases have lower maintenance costs than other models.

2.9.4.2 Disadvantages

• An OODBMS lacks a theoretical foundation because there is no universally


defined data model.
• OODBMS usage is still limited when compared to RDBMS usage.
• There is a lack of security support in OODBMSs that do not include adequate
security mechanisms.
• The system is more complex than conventional database management systems
(UNISA 2022).

2.9.5 Multidimensional models

These types of database models are relational models that have been tweaked to help
with analytical processing. This model is designed for online analytical processing,
while the relational model is optimised for online transaction processing (OLTP)
(OLAP).

A dimensional database’s cells contain information about the dimensions it tracks.


Instead of two-dimensional tables, it looks like a collection of cubes.

A multidimensional model is similar to a relational model,


but whereas a relational model stores data in a two-
dimensional table, a multidimensional model stores data in
a three- or more dimensional table, creating a cube-like
data structure (UNISA 2022).

Data can be viewed in a spreadsheet-like format, which make it easier to understand


where data with many interrelationships are stored and processed. Owing to the
spreadsheet-like format, this data structure is easy to maintain. This model type is used
mainly for data warehouses and makes online analytical processing (OLAP) and
business intelligence software (BIS) possible. BIS will be discussed further in study unit
19.

2.9.6 Object-Relational Database Model

This hybrid database model is a type of database model that combines the relational
model’s simplicity with some of the object-oriented database models’ advanced
functionality. It allows designers to incorporate objects into the common table structure.

27
SQL3, vendor languages, ODBC, JDBC, and proprietary call interfaces are all
extensions of the Relational Model’s languages and interfaces (UNISA 2022).

2.9.7 Entity Relationship Database Models

The entity relationship database model is a type of database model that is similar to the
network model, it captures relationships between real-world entities, but it isn’t as
closely linked to the database’s physical structure. It’s more commonly used to
conceptually design a database.

The people, places, and things about which data points are stored are referred to as
entities, and each of them has specific attributes that make up their domain. The
cardinality of entities, or the relationships between them, is also mapped.

The star schema is a common ER diagram that connects multiple dimensional tables
through a central fact table (UNISA 2022).

2.9.8 Semi-structured Model

A semi-structured model is a type of database model which is typically found in the


database schema and is embedded with the data in this model. The line between data
and schema is blurry at best in this case. These types of database models are useful
for describing systems that are treated as databases but cannot be constrained by a
schema, such as certain web-based data sources. It can also be used to describe
interactions between databases that have different schemas (UNISA 2022).

2.9.9 Inverted File Model

An inverted file structure database is another type of database model that are
designed to allow for quick full-text searches. The data content is indexed as a series of
keys in a lookup table, with the values pointing to the location of the associated files in
this model. For example, in Big Data and analytics, this structure can provide near-
instantaneous reporting.

Since 1970, this model has been used by Software AG’s ADABAS Database
management system, and it is still supported (UNISA 2022).

2.9.10 Flat Model

The flat models are the oldest and most basic types of data models. It simply lists all of
the information in a single table with columns and rows. The computer must read the
entire flat file into memory to access or manipulate the data, making this model
inefficient for all but the smallest data sets (UNISA 2022).
2.9.11 Context Model

As needed, elements from other types of database models can be incorporated into this
model. It combines aspects of the object-oriented, semi-structured, and network
models.

2.9.12 Associative Model

The associative models are database models that categorise all data points into two
categories: entities and associations. An entity is anything that exists independently in
this model, whereas an association is something that exists only because of something
else.

The data is divided into two groups by the associative model:

• A collection of items, each with its unique identifier, name, and classification.
• A collection of links, each with its unique identifier and the source, verb, and
target identifiers. Each of the three identifiers may refer to a link or an item, and
the stored fact is about the source.

Other types of database models that are less common include:

• Information about how the stored data relates to the real world is included in the
semantic model.
• Named graph
• Triplestore (UNISA 2022)

2.10. Centralised and distributed databases


The physical location of an organisation’s databases will depend on its specific
business needs and requirements. We can classify a database according to its
physical storage location as either a centralised or a distributed database.

(a) Centralised database

When using a centralised database, the database is physically stored


in one central location (i.e., it is on one server) (UNISA 2022).

All users interact with this single database in the single location through the computer
network. The benefit of using this type of database is that the database is always up to
date with the latest information if online input and real-time processing are used.

(b) Distributed database

29
When using a distributed database, there are several interlinked
databases stored in several computers in the same (e.g., headquarters)
or different locations (e.g., branches) (UNISA 2022).

When a distributed database is properly managed, users will not know that each
person may be interacting with a database in a different location because they will all
have the same view of the database. Distributed databases are either a partitioned or a
replicated database.

A partitioned database is split into smaller portions (partitions) and the part applicable
to the user is made available on the location closest to the user. Partitioned databases
are generally used when minimal data sharing is necessary between users at the
different locations. For example, an organisation with branches may use a partitioned
database when its customers always only interact with that specific branch and there is
thus no need for the branches to view each other’s customer databases.

In a replicated database, the whole original database is copied to the different locations,
that is, the database is replicated at each location. For example, a pharmacy with
countrywide branches at which customers can obtain new and repeat prescriptions at any
of the branches may use a replicated database for customers. This will enable the
customer to obtain a repeat prescription at any of the pharmacy’s branches without the
branch needing to see the original prescription.

The different replicated databases in one network a r e updated by means of


duplication or synchronisation. In duplication, the master (original) database is
copied to the other locations, normally at a specific frequency and time, and will
overwrite the database at the distributed locations. The database at the different
locations can only be updated by updating the master database. Synchronisation is
more complex and time consuming and involves a two-way updating of the master
database and the distributed databases (i.e., the master database can update the
distributed database and the distributed database can update the master database).
This synchronisation process normally also happens at a pre-set frequency and time.
Data conflicts (i.e., the same data must be updated by both databases and the
software needs to determine which database has the latest or correct data) are usually
resolved through predetermined rules, but in some instances can also be resolved
manually, that is, the user determines which is the latest version of the data (UNISA
2022).
Activity 2.1

One of the big four audit firms uses replicated databases for its
electronic client audit files. A master database of the client audit files is
created and then replicated on each audit team member’s computer.
Each team member works on his or her own “replicated database” and
synchronises to the master copy at a frequency determined by the audit team leader.

(a) Ask your auditor friends or family members or the auditors at your organisation if they
use databases for their client audit files.
(b) Determine whether they use a centralised or a distributed database.
(c) Is the distributed database updated through duplication or synchronisation?

Go to Discussion Forum 2.1 and discuss your findings with your fellow students.

Guidelines for participating in forums:


• Compile your post offline and keep a record of it.
• Use an academic writing style for referencing and citing the sources you used.
• Post your answer on the forum.
• Reply to the contributions of at least two of your fellow students.

2.11. Relational database terminology


Because a relational database is the most commonly used database model for
business and financial databases, we will look at the database terminology applicable
to relational databases.

A relational database comprises several database files, each of which consists of


several data records. A data record consists of several data fields, each of which
contains a data value. See figure 2.4 for a schematic representation, but bear in mind
that each database contains many more data files, data records and data fields than those
depicted in the figure. A data record can also be updated to multiple database files.

31
FIGURE 2.4: Simplistic database overview (UNISA 2022)

Refer to figure 2.5 below. Each of the files shown is only an extract – the real
transaction and master files contain many more data fields and data values.

FIGURE 2.5: Database terminology (UNISA 2022)


(a) Data value

A data value is a character (a single number, letter or special


character) or a group of related characters used to populate the data
field (UNISA 2022).

The data value entered will vary from data field to data field. For example, a data
value can be a number, say, 5, or a name, say, Thabo.

(b) Data field

A data field contains a data value and is the smallest unit of data that
can be accessed in a database (UNISA 2022).

A data field is like a cell in a spreadsheet. The data value contained in a data field will
differ from data record to data record.

Data fields can be compulsory (data must be entered into this field), optional (the field
may be left blank if no data is entered) or calculated (the data value is not entered but
automatically derived from a formula based on other data fields).

In figure 2.5, in the purchase transaction file, the data value, 4, is entered in the
quantity data field for record PN10031. The balance data fields in the supplier master
file are an example of a calculated data field.

(c) Attribute

An attribute, commonly known as a column, represents one unique


characteristic of a single database file (UNISA 2022).

However, an attribute can appear in more than one database file. Each attribute will
have a specific field length (number of characters that can be entered in the field) and a
specific data type (numbers, characters, dates, etc). The field length and the specific
data type particular to that attribute are described in the data dictionary.

In figure 2.5, in the purchase transaction file, the attribute labelled “VAT” will include
the VAT amount for each record. The “Credit limit” attribute in the supplier master file
will indicate the credit limit of each supplier and the “Inventory category” attribute’s
data type will be alphabetic characters only.

33
(d) Field name

All attributes have a unique name known as a field name, which


labels the data stored in the attribute (UNISA 2022).

Field names are unique and no column (attribute) can therefore have the same name
in a single database file; i.e., an attribute with the field name “supplier code” will only
appear once in the “supplier master file”. A field name can, however, appear in more
than one database file, i.e., an attribute with the field name “supplier code” can appear
in both the “supplier master file” and the “purchase transaction file”.

In figure 2.5, in the purchase transaction file, “Price per unit” is a field name.
“Minimum order qty” is a field name in the inventory master file.

(e) Data record

A data record is a set of logically related data fields about a single


member or item (UNISA 2022).

A data record is also referred to as a “tuple” and is like a row in a


spreadsheet. All data records of a particular database file will have the same
structure – that is, it will consist of the same type of data fields that is ordered
in the same order. For example, every student record in a student master file
will contain a student number, first name, surname, telephone number and
identification number. In figure 2.5, the data record for supplier, Forever PC, in
the supplier master files contains data fields for “supplier code”, “supplier
name”, “telephone”, “credit limit” and “balance”. All these data fields together
are referred to as a data record. figure 2.5, Forever PC’s data record in the
supplier master file is as follows:

FOR001 Forever PC +00 33 923- 95000 94706.81

(f) Primary data field

Each file has a unique data field (known as the primary data field) that
can be used to uniquely identify each data record in a database file. A
primary data field is also known as a primary key (UNISA 2022).

In figure 2.5, the “supplier code” (e) data field is the primary data field in the “supplier
master file” and the “inventory number” data field (f) is the primary data field in the
“inventory master file”.
The combination of the invoice nr (a) and the line nr (b) fields in the purchase transaction
file together make a unique data field – that is, PN10029 (invoice nr) and 1 (line nr)
creates a primary key, namely PN100291.

(g) Foreign key

When a primary data field of a database file is entered into another database
file to create a relation between the two database files, the primary data field
in the other database file is known as a foreign key (UNISA 2022).

A foreign key does not uniquely identify a record and may have duplicates in a
database file. The use of foreign keys prevents the duplication of data.

In figure 2.5, the “purchase transaction file” links (relates) to the “supplier master file”
through the use of the “supplier code”. The “supplier code” (e) data field is the primary
data field in the “supplier master file” as it uniquely identifies the supplier record but in
the “purchase transaction file” the “supplier code” (c) is the foreign key as it links the
two files through a primary data field. Note that there is more than one entry for
supplier code “FOR001” in the purchase transaction file.

The “purchase transaction file” links (relates) to the “inventory master” file through the
use of the “inventory number”. The “inventory number” field (f) is the primary data field
in the “inventory master file”, but in the “purchase transaction file”, the “inventory
number” (d) is known as the foreign key. The master files have been sorted in a
different order, but individual data records can still be found using the unique primary keys
and foreign keys.

(h) Database files

A database file, also known as a database table, is an organised collection


related data records (UNISA 2022).

Each database file contains related records – that is, the records in the database file
have a common theme. There are different types of database files, namely master
files, transaction files, reference files and history files.

• Master file
A master file contains data records of a relative permanent nature (i.e., they
do not change regularly) about the organisation’s resources and subjects (i.e.,
customers, suppliers, inventory, employees, etc) (UNISA 2022).

35
Some of the data in the master file is updated periodically by the transaction files. The
master file is the most important file in the database and is the authoritative source of
data. In figure 2.5, the supplier master file contains data records about all the
organisation’s suppliers and the data fields in the records are relatively permanent
(i.e., the name of the supplier and telephone number do not regularly change).

• Transaction file
A transaction file contains data records relating to the daily individual activities
of the organisation (e.g., the organisation’s sales). A transaction file changes
regularly as additional transactions are processed (UNISA 2022).

These transaction data records (i.e., the transaction file) are used to update or change
the master file. In figure 2.5, the purchase transaction file contains data records about
the organisation’s purchase transactions for June 2016, which may be used to update
the balance field in the supplier’s master file.

• Reference file
A reference file is a semi-permanent file containing data records referenced to
by the transaction file in order to complete a transaction (UNISA 2022).

Examples of a reference file in an accounting transaction processing systems are tax


tables needed to calculate pay-as-you-earn (PAYE) or price lists referenced in order
to calculate the sales price per item.

• History file
A history file contains data records about transactions completed in the past (UNISA 2022)

The data records in the history files are derived from the transaction file and are used
in future queries and references. For example, the prior year purchase transactions are
moved from the purchase transaction file to the purchase history file at the end of the
financial year, during the year-end process.
Activity 2.2

Microsoft Access and OpenOffice Base are examples of database


software. Visit the internet for the following:

(a) For Microsoft Access training programmes:


– Type the following URL: https://support.office.com/en-us/
– Select the “More help and training” option.
– Locate “Access” and select the “Training” option.
– Complete the different training options.

(b) Note: This activity refers to databases and incorporates aspects of what was learnt in
study unit 1.

Campus Computers is a business that sells computers and software to students.


The business has developed its own software to record its business transactions
in a database.

The business has three overseas suppliers and two local suppliers from which it
purchases inventory. Inventory must be ordered when the quantity on hand
reaches the minimum reorder level.

Data was processed into information in the database of Campus Computers, as


can be seen in the extract of the database files below.

Required:
Identify examples of each of the processing methods listed below:

(a) Classifying
(b) Sorting
(c) Calculating
(d) Summarising

The following is an extract from the files of Campus Computers’ database:


Supplier master file
Supplier no. Supplier name Telephone Currency Balance (a)
PCW003 PC World +27 09 847-9387 ZAR 9251.18
DIL001 Dille Computers +00 1 907-8334 USD 54397.90
FOR001 Forever PC +00 33 923-1426 EURO 94706.81

37
SAP001 SAPC +27 09 959-1234 ZAR 229866.02
GIG002 GIGAB Computers +00 1 213-1177 USD 1528599.85

Inventory master file


Inventory Item Inventory Quantity Minimum Last cost Order Yes/
no. (b) description category on hand reorder price No (e)
(c) (d) level
HD/250 360 Gig Parts 22 7 729.50 No
hard drive
LAP142 Laptop Computer 12 5 10000.00 No
– SAPC
LAP175 Laptop – Computer 35 20 14000.00 No
GIGAB I
MON190 Monitor – Parts 7 12 3400.00 Yes
19 inch
MOU050 Mouse Accessories 9 15 45.35 Yes
– cordless

Purchase transaction file


Invoice Line Supplier Purchase Inventory Quan Price VAT (g) Total (h)
no. (f) no. no. date no. per unit
PN10029 1 SAP001 14-Jun-16 LAP142 5 10000.00 7000.00 57000.00
PN10030 1 FOR001 15-Jun-16 MOU050 9 45.35 57.14 465.29
PN10030 2 FOR001 15-Jun-16 HD/250 1 729.50 102.13 831.63
PN10030 3 FOR001 15-Jun-16 MON190 3 3400.00 1428.00 11628.00
PN10031 1 GIG002 16-Jun-16 LAP175 4 15500.00 8680.00 70680.00

Supplier history file


Supplier no. Month (i) Transaction Amount excl VAT (l) Amount incl
type (j) VAT (k) VAT (m)
FOR001 April-15 Purchase 58456.50 8183.91 66640.41
FOR001 May-15 Purchase 114580.00 16041.20 130621.20
FOR001 May-15 Payment -101298.00 -14181.72 -115479.72
FOR001 June-15 Purchase 11337.65 1587.27 12924.92
Balance for the quarter (n) 83076.15 25812.38 94706.81

Go to Discussion Forum 2.2 and complete activity (b).

Guidelines for participating in forums:


• Compile your post offline and keep record of it.
• Use an academic writing style for referencing and citing the sources you used.
• Post your answer on the forum.
• Reply to the contributions of at least two of your fellow students.

2.12. Factors to consider when choosing a DBMS and database


Organisations should consider several factors when deciding on an appropriate
DBMS and database. These factors should not be considered in isolation because
factors will influence one another and some factors, such as costs, may also be a
constraint for all other factors. The following are some of the factors an organisation
should consider:

• The database model type used should support the requirements of the organisation –
that is, a financial system might only require a relational database, but an
organisation that requires online analytical processing (OLAP) needs to use a
multidimensional model.
• The acquired DBMS and database should closely match the requirements of
the organisation.
• The DBMS and database should be able to evolve to meet future organisational
needs.
• The performance (i.e., reaction time) of the DBMS and database. How fast can
records be updated or queried in the database?
• The cost of the DBMS and database should be considered. Can the organisation
afford the DBMS and database?
• Different DBMS and databases will require different levels of specialised staff
skills. Are there specialised skills available in the organisation or can the
organisation acquire the skills required?
• The hardware needed to run the DBMS and database should be considered.
The organisation may need to acquire hardware if it is not already available. This
will have further cost implications.
• Can the DBMS and database be integrated with the rest of the organisation’s
information systems?
• The database size (amount of data the database can manage) must be adequate for
the organisation’s future data requirements and the database should easily be
expandable.
• The number of concurrent users (the number of users who can assess the
database at the same time) the DBMS and database can handle should be taken
into account.
• The DBMS and database vendor should be a reputable organisation and
financially stable because this vendor will need to provide future support for the
solution.

39
Reflect

Pastel uses a relational database to store a huge amount


of accounting data.

Make a note that you must return to topic 1 once you have mastered
Pastel (topic 7) and consider the following:

– In which files are the various types of information that are


captured during each Pastel lesson stored?
– What are the field names in each file?
– Which files are master files and which a re transaction
files? What about reference files?
– How are the various files interlinked?
– How are the various files updated with the processing of
each type of transaction?

2.13 Summary

In this study unit, we learnt about the database environment, the advantages and
disadvantages of using a database environment and the components of this
environment. We also gained an understanding of different database models,
centralised and distributed databases and the terminology used in a relational
database. We dealt with the factors to consider when choosing appropriate database
management software and a database. In the next study unit, we will investigate the
utilisation of databases in an organisation.

REFERENCES

Dane, K. (2022). The Components of Database System Environment - Owlgen


(accessed 31/10/2022)

Eisa, I., Salem, R. & Abdelkader, H. (2017). December. A fragmentation algorithm for
storage management in cloud database environment. In 2017 12th International
Conference on Computer Engineering and Systems (ICCES) (pp. 141-147). IEEE.

Fakhimuddin, M., Khasanah, U. & Trimiyati, R. (2021). DatabaseManagement System


in Accounting: Assessing the Role of Internet Service Communication of Accounting
System Information. Research Horizon, 100-105.
Kim, H.l., Kim, H.J. & Chang, J.W. "A range query processing algorithm hiding data
access patterns in outsourced database environment." In International Conference on
Data Mining and Big Data, pp. 434-446. Springer, Cham, 2016.

Mansouri, Y., Toosi, A.N. & Buyya, R. (2017). Data storage management in cloud
environments: Taxonomy, survey, and future directions. ACM Computing Surveys
(CSUR), 50(6), pp.1-51.

Pavlo, A., Angulo, G., Arulraj, J., Lin, H., Lin, J., Ma, L., Menon, P., Mowry, T.C.,
Perron, M., Quah, I. & Santurkar, S., Tomasic, A., Toor, S., Aken, D.V., Wang, Z., Wu,
Y., Xian, R. & Zhang,T. "Self-Driving Database Management Systems," in CIDR 2017,
Conference on Innovative Data Systems Research, 2017

Shrivastava, S. & Pateriya, R.K. (2017). Efficient storage management framework for
software defined cloud. International Journal of Internet Technology and Secured
Transactions, 7(4), pp.317-329.

University of South Africa. (2022). Study guide for Practical Accounting Data
Processing AIN2601. Pretoria.
Verma, Kumar & Dixit. (2016). Data synchronization in heterogeneous database
environment. In 2016 2nd International Conference on Contemporary Computing and
Informatics (IC3I) (pp. 536-541). IEEE.

Wong, W.K., Kao, B., Cheung, D.W.L., Li, R. & Yiu, S.M. (2014). June. Secure query
processing with data interoperability in a cloud database environment. In Proceedings
of the 2014 ACM SIGMOD international conference on Management of data (pp. 1395-
1406).

41

You might also like