Professional Documents
Culture Documents
The database provides the information to meet the operational need as well as future
planning for any organisation, which should all be done within the framework of
organisational procedures and policies. Procedures are business rules and instructions
that govern the design and use of the database systems, enforce the standards,
monitor and audit the data that resides in the databases and regulate the information
that is generated from the stored data. Finally, the data, which is vital to the health of
the organisation, plays a critical role in the design of the database. The existence of the
database system depends on organisational structure and requirements at each level.
The complexity depends on the size of the organisation and its functions and corporate
culture. Elomari, Maizate and Hassouni (2016), posit that as data volumes to be
processed in all domains – scientific, professional, social, amongst others – are
increasing at a high speed, their management and storage raises more and more
challenges. The emergence of highly scalable infrastructure has contributed to the
evolution of storage management technologies. However, numerous problems have
emerged such as consistency and availability of data, scalability of environments and
competitive access to data.
2.1 Introduction
In the previous study unit, we learnt that the database environment provided the
solution to the need of users to be able to share and view the same data and
information in an organisation. Databases became the preferred method of storing
data and information because of their numerous advantages. In this study unit, we will
gain an understanding of the database environment, the advantages and
disadvantages of using a database environment and an understanding of the database
environment’s components. We will also explain database models, distributed
databases and the terminology used in a relational database. In conclusion, we will
look at the factors to consider when choosing appropriate database management
software and a database.
1
• Assess database management systems based on an understanding of the
operating environment by
o naming and describing each of the elements of the database
environment
o listing the functions of a database management system
o defining and describing each of the components of a database
management system
o differentiating between various database models
o defining relational database terminology and identifying each item on a
simple database representation
o listing the advantages and disadvantages of using a database
environment for data storage and processing
o identifying factors to consider when choosing an appropriate database
management system and database
The following icons are included in this study unit:
3
The database environment and its related components are explained in detail below.
2.2.2.2 Disadvantages
(i) Start-up and operating costs. It can be expensive to acquire the hardware
and software needed to set up a database environment. Furthermore, an
organisation will need to hire additional employees, such as a database
administrator, to manage the database environment.
(ii) Database systems are complex to design and use.
(iii) Because databases are complex, it is very time-consuming to design a
proper database.
(iv) Database or database management software failure will affect all
application software linked to that specific database. This can make recovery from
such a failure more difficult. A failure can shut down a whole organisation or
department(s) in the organisation, making the organisation unable to run its daily
operations and provide adequate customer service (UNISA 2022).
There are various database users in the database environment, including the following:
5
– Implement, maintain and evaluate database access policies and security controls.
– Monitor data and database security and access.
b. Data administrator
One of the data administrator’s responsibilities is to manage the integrity of the data in
the database by setting and enforcing the data standards and data definitions to be used
in all the organisation’s databases. In many organisations, the functions of the data
administrator and database administrator are combined – hence these functions are
performed by the same person.
c. End-users
End-users capture data in the database and extract information from the
database using database management system software (UNISA 2022).
Owing to their computer skill level, most end-users will interact with the database
management system (DBMS) through application software. DBMS is explained in
section 2.3.
d. Application programmers
(i) design, create and maintain the database structure and the
database
(ii) control the organisation, storage and retrieval of data in the
database
(iii) capture, maintain (delete, insert and amend) and manipulate the
data in the database
(iv) share data between multiple users simultaneously
(v) execute queries and generate output
(vi) control the movement of the data between authorised users and
the database
(vii) control and monitor access to the database
(viii) analyse and monitor database performance (UNISA 2022)
According to Pavlo, Angulo, Arulraj, Lin, Lin, Ma, Menon, Mowry, Perron, Quah and
Santurkar (2017), using existing automated tuning tools is an onerous task, as they
require laborious preparation of workload samples, spare hardware to test proposed
updates, and above all else, intuition into the DBMS’s internals. They argue that if the
DBMS could do these things automatically, then it would remove many of the
complications and costs involved with deploying a database.
In the last two decades, both researchers and vendors have built advisory tools to
assist database administrators (DBAs) in various aspects of system tuning and physical
design. The database landscape, however, has changed significantly in the last decade
and one cannot assume that a DBMS is deployed by an expert that understands the
intricacies of database optimisation. But even if these tools were automated such that
they could deploy the optimisations on their own, existing DBMS architectures are not
designed to support major changes without stressing the system further, nor are they
able to adapt in anticipation of future bottlenecks. Peloton, the first self-driving DBMS
with autonomic capabilities, now makes it possible due to algorithmic advancements in
deep learning as well as improvements in hardware and adaptive database
architectures. Most of this previous work, however, is incomplete because they still
require human intervention to make the final decisions about any changes to the
database. These are reactionary measures that fix problems after they occur. In this
regard they further proposed that what is needed for a truly “self-driving” DBMS is a
new architecture that is designed for autonomous operation. This is different than earlier
attempts because all aspects of the system are controlled by an integrated planning
component that not only optimises the system for the current workload, but also predicts
future workload trends so that the system can prepare itself accordingly.
With this, the DBMS can support all the previous tuning techniques without requiring a
human intervention to determine the right way and proper time to deploy them and it
also enables new optimisations that are important for modern high-performance
DBMSs. However, these are not possible today because the complexity of managing
these systems has surpassed the abilities of human experts. Thus, the idea of using a
DBMS to remove the burden of data management was one of the original selling points
7
of the relational model and declarative query languages from the 1970s. With this
approach, a developer only writes a query that specifies what data they want to access.
The DBMS then finds the most efficient way to store and retrieve data, and to safely
interleave operations.
Much of the previous work on self-tuning systems is focused on standalone tools that
target only a single aspect of the database. For example, some tools are able to choose
the best logical or physical design of a database such as indexes, partitioning schemes,
data organisation, or materialised views. Other tools are able to select the tuning
parameters for an application. Most of them operate in the same way: the DBA provides
it with a sample database and workload trace that guides a search process to find an
optimal or near-optimal configuration. All of the major DBMS vendors’ tools, including
Oracle, Microsoft, and IBM, operate in this manner. There is a recent push for
integrated components that support adaptive architectures, but these again only focus
on solving one problem (Pavlo et al, 2017).
Pavlo et al (2017) provide that likewise, cloud-based systems employ dynamic resource
allocation at the service level, but do not tune individual databases. All of these are
insufficient for a completely autonomous system because they are (1) external to the
DBMS, (2) reactionary, or (3) not able to take a holistic view that considers more than
one problem at a time. That is, they observe the DBMS’s behaviour from outside of the
system and advise the DBA on how to make corrections to fix only one aspect of the
problem after it occurs. The tuning tools assume that the human operating them is
knowledgeable enough to update the DBMS during a time window when it will have the
least impact on applications.
According to Cui, Yang, Wang, Geng and Li (2020), due to secure range query over
encrypted data in outsourced environments with the rapid development of cloud
computing paradigm, data owners have the opportunity to outsource their databases
and management tasks to the cloud. Due to privacy concerns, they are required to
encrypt the databases prior to outsourcing. However, there are no existing techniques
for handling range queries in a fully secure way. To efficiently process secure range
queries, the extraordinarily challenging task is how to perform fully secure range queries
over encrypted data without the cloud ever decrypting the data.
9
2.3.2.5. Characteristics of cloud computing
The essential characteristics and features of cloud computing include the following:
a. It must be an on-demand self-service, which means that users of the cloud can
automatically provision resources with minimal or no human interference of the
cloud service provider.
b. Users access the cloud services via networks by deploying suitable techniques and
protocols with the use of thick or thin clients.
c. Resource pooling. It means that services of the cloud are pooled and serve many
consumers using a multi-tenant model wing: Foundations and Related Topics,
External Public Cloud, Public Hybrid Cloud, Private Cloud, Internal Enterprise: cloud
computing deployment models adapted from:
(i) Public clouds: This is the most popular deployment model of cloud
computing. In this model, the cloud infrastructure and resources are owned
by an enterprise that provide them to individuals or other enterprises in a pay-
as-you-go model. Cloud resources are shared between many consumers.
Leaders in the market who provide cloud services of this model are Google
and Amazon. They provide many options that allow users to get the
resources with minimal cost and less management effort. Major concerns are
privacy, security, and data control.
(ii) Private clouds: Cloud infrastructure operates to serve one organisation.
Management of the cloud is done by a third part or by the organisation. This
model usually attracts governments and organisations that prefer to keep
data in a private environment.
(iii) Hybrid clouds: The cloud infrastructure is a combination of private and public
clouds. Each of them will still be a single entity connected to another cloud. In
this case, enterprises can choose to store their data on the private part of the
cloud.
(iv) Community clouds: Enterprises with the same needs share the cloud
infrastructure. The cloud is managed by a third party or by the enterprises
that share it (UNISA 2022).
According to Mansouri, Toosi and Buyya (2017), storage as a Service (StaaS) is a vital
component of cloud computing by offering the vision of a virtually infinite pool of storage
resources. It supports a variety of cloud-based data store classes in terms of
availability, scalability, ACID (Atomicity, Consistency, Isolation, Durability) properties,
data models, and price options. Application providers deploy these storage classes
across different cloud-based data stores not only to tackle the challenges arising from
reliance on a single cloud-based data store, but also to obtain higher availability, lower
response time, and more cost efficiency.
2.3.2.7 Adoption of Cloud
According to Shrivastava and Pateriya (2017), in this era, every person has
experienced major changes because of increased internet connectivity and mobile
phones. Thus, the exponential growth of data is a matter of concern for every
organisation, and storage of a huge data mountain is only possible through adoption of
the cloud. Nowadays, popularity of software defined system is increasing, and
virtualised cloud data centres are also moving towards software-defined data centres.
This change is possible only because of the advancement in software-defined networks
and software-defined storage, amongst others. The day-to-day generation of digital
data during internet usage is exponentially increasing from petabyte to exabyte. Big
data storage, maintenance and analyses are now the hot research area, which needs
very innovative ideas. Maintaining data availability, reliability, and security on third party
clouds are the key concern. Storage services are obstructed because of various types
of hardware or software failures, maintenance or upgrade operations on low-cost
commodity servers. To overcome all these faulty conditions and providing maximum
uptime for the data stored in the cloud, redundancy is maintained in the cloud.
Replication of high space consumption make it very expensive for storage of big data,
hence the emergence of large, distributed storage systems inclined towards usage of
erasure coding to tackle faults and minimising space consumption. Replication is a
well-known redundancy technique for creating multiple copies of the data, which has
been applied in most of the distributed storage systems.
2.3.2.8 Cloud cost of ownership and storage management
Shrivastava and Pateriya (2017) developed a framework for data management
interface for software-defined storage using well-known redundancy techniques,
replication, and erasure coding. Cloud service providers virtualised their data centres to
provide cheaper services, but still availability, reliability and fault tolerance have a
heavy impact on their earnings. This virtualisation work focused on solving the
following two issues:
(i) Reliability and cost of data storage in the cloud by continuous monitoring
(ii) Scanning of the storage system
The Erasure code utilises proper space by applying sophisticated algorithms and is
able to recover data in case of failure. It provides space optimality but has high
reconstruction costs. Triplication policy for fault tolerance has been applied for a long
time but its high storage consumption forces the use of erasure codes to provide
availability and reliability in cloud storage systems. This new SD Storage framework
makes a separation between software and hardware layers and apply replication as
well as erasure codes together. SD Storage can serve different demands by applying
various policies and can handle the ever-increasing mountain of data. The storage
controller present in SD Storage is programmable and helps in managing and
provisioning of storage resources to provide cost-effective solutions to optimise data
centre performance. Adjusting the erasure codes on the fly and combining it with
replication, controls the overall completed requests and helps in minimising access
time of files. This framework added new functionality into SDStorage controller that
automates resource provisioning, which helps in enhancing organisation efficiency and
reducing the total cost of ownership. These added features in SDStorage do not
degrading service level agreements, hence it will boost the adoption of a software-
defined cloud by various organisations. This work can be further improved by adding a
security feature.
This new framework decreases the total cost of ownership and provides and efficient
technique for storage management in the cloud, which propels the development of a
software-defined cloud.
11
a. Characteristics of software-defined systems
Verma, Kumar and Dixit (2016) propose that data synchronisation refers to always
having the updated dataset from every database located at every / any location and
then to transform it into the meaningful information to be used at the users’ situated at
various locations. Thus, data synchronisation is the process of establishing
consistency among data from a source to a target data storage and vice versa and the
continuous harmonisation of the data over time. In the heterogeneous database
environment, technical heterogeneity, data model heterogeneity as well as symantec
heterogeneity can be encountered. This means an organisation can have multiple
types of databases and data residing in various databases, like Oracle, MySQL,
PostgreSQL and SQL Server amongst others, located at different locations, namely
zones/states/districts, amongst others, having different structures yet storing
similar/the same information in them and vice versa. In a heterogeneous database
environment, data synchronisation is a major issue. For example, any government
initiative invariably brings with it challenges related to fetching of data from various
locations like respective zones/states/districts, including its synchronisation,
transformation, and standardisation. The need was felt to have a configurable low-cost
or open-source utility which can fetch data incrementally or completely (according to
the requirement) in regular (configurable) intervals of time with error tracking and
correction mechanisms.
As stated by Verma, Kumar and Dixit (2016), the following provides the solution to data
synchronisation in a heterogeneous database environment:
On the other hand, the trigger is one kind of mechanism in most database systems.
Oracle, SQL Server, DB2 support this kind of mechanism. Once database content
changes have occurred as a result of triggers, the database server can automatically
take the relevant action. These actions may include insert, delete, and update or
execute procedure. Using the trigger can compile much application sharing the SQL
sentence. A trigger's principal advantage is that when the data is revised it can carry
out the order, which which is automatically defined by this triggering procedure.
DB links and triggers were possible solutions but due to limitations related to different
kind of databases in use in various SDCs, this method cannot be used.
A database log is an important tool for recovery of data and maintaining the integrity of
the database. It already contains all the operating records of the information that was
submitted successfully. The log analysis method is implemented by analysing the log
information of the database log to capture changes in sequence of synchronisation
objects, because the database log already contains all the operating record that
submitted successfully. As most of the database log's format does not open, there is a
need to use dedicated log analysis tools or interface to parse the logical log of the
databases, to restore the operation that happened to SQL statements and record the
log files. Log files should contain the operation time, SQL statements, etc.
13
All DDL and DML SQL query from source database to target database may be
captured. A process runs continuously in the back end to read SQL query and passes
the SQL query to the target system through HTTP. Another process processes the
SQL query in the target system.
c) Timestamp-Based Approach
The method requires that every table of the application systems in the event has a
timestamp field to record the modification time of each table. This method does not
affect the efficiency of the original application but need to make larger adjustments to
the original system and cannot capture the operation of the data changes that are not
caused by the application itself.
e) Through Xml
XML is a SGML simplified standard edition. XML is not a kind of programming
language, but a kind of data description language, which half structures the data. XML
documents usually consist of the state, the element, the attribute, and the text
constitution
The XML correlation technique that must be used in this system mainly contains the
following: XML documents structure description, demonstration, and programming
connection technology.
f) Through JMS
The JMS: Java message service (JMS) is a group of Java application program
interfaces (Java API). It provides the foundation, the transmission, and receiving the
read message service [8]. JMS has two kind of programming models: The point-to-point
(point-to-point, P2P) model, and issues and subscribes (publish-and-subscribe,
pub/sub) mode [9]. The P2P message passing model transmits each message sent
through a queue to a receiver. The P2P model ensures that there is only one recipient,
reading each message. In the pub/sub message model, a message producer is sent to
one or more registered consumers based on the theme of a message. Consumers can
subscribe to a theme.
g) Third Party Software Like Symmetric DS
Symmetric DS was based on triggers. Every time there was any change in the schema
at any SDC, we could not get the synchronised data. Hence, in our cases we do not get
the desired results through these triggers.
h) Governance Issues
2.3.4.1 Features
(i) Cross Platform - It works as a web-service, hence can be used through any
operating system.
(ii) Multi-Threaded – Can work in parallel for all states all jobs that are running.
(iii) Automatic Recovery – Jobs in which error occurs, are tried again until it is
successful or cancelled manually.
(iv) Initial data load – Data can be fetched incrementally as well as completely as per
requirement.
(vi) Communication Methods – Can pull changes from various states based on
configurable time intervals (automatically) through jobs as well as through manual
method as per requirement.
(vii) Monitoring – Can monitor for errors or pendency and raise alerts accordingly.
(viii) Embeddable – Can be embedded into any application without much effort.
The decentralised nature of our scientific communities and healthcare systems has
created a sea of valuable but incompatible electronic databases (Verma et al 2016).
This was materialised through a utility for fetching of data from different heterogeneous
databases, placed at different locations followed by its synchronization and
transformation of synchronised data through mapping processes. Through this utility we
are be able to successfully synchronise data after fetching it to the central location
(Central Data Center) for any program that may require the use of the data.
15
According to Kim, Kim and Chang (2016:443-446), research on secure range query
processing techniques in outsourced databases has increasingly come under the
spotlight with the development of cloud computing. The existing range query processing
schemes can preserve the data privacy and the query privacy of a user. However, they
fail to hide the data access patterns while processing a range query. So, in this paper
we propose a secure range query processing algorithm which hides data access
patterns. Our method filters unnecessary data using the encrypted index. We show
from our performance analysis that the proposed range query processing algorithm can
efficiently process a query while hiding the data access patterns.
DBMSs are a ubiquitous and critical component of modern computing, and the result of
decades of research and development in both academia and industry (Fakhimuddin,
Khasanah & Trimiyati 2021). Historically, DBMSs were among the earliest multi-user
server systems to be developed, and thus pioneered many systems design techniques
for scalability and reliability now in use in many other contexts. While many of the
algorithms and abstractions used by a DBMS are textbook material, there has been
relatively sparse coverage in the literature of the systems design issues that make a
DBMS work.
This is an invaluable reference for database researchers and practitioners and for those
in other areas of computing interested in the systems design techniques for scalability
and reliability that originated in DBMS research and development. It presents an
architectural discussion of DBMS design principles, including process models, parallel
architecture, storage system design, transaction system implementation, query
processor and optimiser architectures, and typical shared components and utilities.
While many of the algorithms and abstractions used by a DBMS are textbook
material, Architecture of a Database System addresses the systems design issues that
make a DBMS work.
The external level, also called the user view, is the individual end-
user’s view of the data and the database (UNISA 2022).
Because users’ information needs differ, the views they require of the database will
also differ – hence there may be an infinite number of external views. For example, the
creditor’s clerk input screen and reports (user view) will look different from the input
screen and reports (user view) of the cashbook clerk. When working on Pastel Partner
(topic 6), we will see in practice how the user views differ, depending on the type of
transaction processed (i.e., creditors, cashbooks, etc).
The database administrator will generally use this view. In comparison with the user
view, which may have infinite variations, there is only one conceptual view.
The internal level, also called the physical view, is the low-level view
of how the data is physically stored on a storage device such as a
magnetic hard drive disk (UNISA 2022).
There is only one physical view. The binary code (1s and 0s, e.g., 01100011) in the
database is one facet of the physical view.
17
(a) Data dictionary
The data dictionary is a very import tool for all database users as it ensures all users
have the same understanding of the data fields and database files. A data dictionary
will therefore assist in the accurate processing of data and make information and/or
data easier to analyse.
As you have noticed, this section about the data dictionary refers to terminology you
may not be familiar with. These terms are explained in section 2.5 of this study unit.
Therefore, refer back to this section about the data dictionary after you have worked
through section 2.10.
(b) Database languages
Database objects include database tables, views, rules, indexes and so forth. DDL is
usually only available for use by the database administrator and requires detailed
knowledge of the conceptual level of the DBMS.
DCL controls the security and user access to the database objects
and data in the database (UNISA 2022).
Data manipulation language can be used by all the database users, but the level of use
will be determined by their skill level and access granted. Most end-users, however,
19
access the DML through application software. DML and DDL should not be confused.
DML is used for the data stored in the database, while DDL is used on the database
objects and structure.
Data query language is used to retrieve data from the database (UNISA 2022).
All database users can use data query language. However, owing to their
programming skill level, most end-users access the data query language through
application software.
Using a DBMS, different application software and users in an organisation can access
the same data and a variety of other data in the database.
A data model is a model that describes in an abstract way how data is represented in
an information system or in a DBMS. Choosing the data model has a fundamental
effect on the other aspects of a database system like the integrity constrains, and data
access.
The most used data model is RM, which was developed for classic database
applications such as banking systems, airlines reservations, and sales/customers
relations. It was implemented by major DBMS like Oracle, IBM DB2, MS SQL,
PostgreSQL, amongst others. In this model, data is organised in tables (relations) of
records (tuples) with columns (attributes). A table can have a primary key, which is the
unique identifier of rows. A primary key can be referenced from another table as a
foreign key and forces integrity constrains on the data (UNISA 2022). Databases are
the literal backbone of a client’s lifestyle or a business’s worth because of the social
value that each individual has assigned to them. The core of the functionality that they
provide to users is the design of various types of database models.
A database model is a type of data model that defines a database’s logical structure. It
determines how data can be stored, organised, and manipulated. The relational model,
which uses a table-based format, is the most common database model. It demonstrates
how data is organised and the various types of relationships that exist between
them. The facts that can enter the database, or those of interest to potential end-users,
are specified by a database schema, which is based on the database administrator’s
knowledge of possible applications. In predicate calculus, the concept of database
schema is analogous to the concept of theory. A database, which can be seen as a
mathematical object at any point in time, closely resembles a model of this “theory.” As
a result, a schema can contain formulas that represent both application-specific
integrity constraints and database-specific integrity constraints, all expressed in the
same database language. Databases can be classified according to the theoretical
data structure, referred to as a data model, on which it is based. The data model used
will determine the manner in which the data is stored and organised, and the operations
that can be performed on the database. The following types of database models have
distinct appearances and operations and can be used in different ways, depending on
the needs of the user. (Mohammad & Schallen 2011; Balasankula 2022).
A number of database models can be used. However, we will only extensively discuss
some of the main model types, namely hierarchical, network, relational, object-oriented
and multidimensional. Others will be briefly introduced.
The hierarchical model was used in early databases and, as the name
indicates, the data is structured in a hierarchical (upside down tree-like)
structure (UNISA 2022).
Nowadays, these types of database models are uncommon. It has nodes for records
and branches for fields. A hierarchical database is exemplified by the Windows registry
in Windows XP, whose configuration options are saved as node-based tree structures.
The “parent-child” relationship is used to store data in this type of database.
21
FIGURE 2.3: Hierarchical model (UNISA 2022)
2.9.1.1 Advantages
2.9.1.2 Disadvantages
The network model supports many-to-many relationships, that is, data may be
accessed by following several paths (UNISA 2022).
In many-to-many relationships, a child can have multiple parents and there can be
relationships between children. Nowadays, the use of this model is mostly obsolete.
The Database Task Group formalised this model in the 1960s, and the hierarchical
model is generalised in this model. It can have multiple parent segments, which are
grouped into levels, but there is a logical relationship between the segments that belong
to each level. Typically, any of the two segments have a many-to-many logical
relationship.
The network models are the types of database models that are designed to represent
objects and their relationships flexibly. The network model extends the hierarchical
model by allowing many-to-many relationships between linked records, which implies
multiple parent records.
The types of database models are built using sets of related records and are based on
mathematical set theory. Each set contains one owner or parent record as well as one
or more child or member records. This model can convey complex relationships
because a record can be a member or child in multiple sets.
After being formally defined by the Conference on Data Systems Languages in the
1970s, it became extremely popular (CODASYL).
2.9.2.1 Advantages
2.9.2.2 Disadvantages
• Because all records are maintained using pointers, the database structure
becomes extremely complicated.
• Any record’s insertion, deletion, and updating operations necessitate numerous
pointer adjustments.
• Changing the database’s structure is extremely difficult. (UNISA 2022)
23
2.9.3 Relational model
A table is also known as a relation, and each database has several tables. Every table
has its own primary key and the database uses this to link (relate) the table to the other
tables in the database (primary key is explained in section 2.9.3.2). A table is similar to
a spreadsheet with rows and columns. (Spreadsheets are discussed in topic 2).
MySQL, Microsoft SQL Server and Oracle are examples of relational model
databases.
The following three key terms, relations, attributes, and domains are frequently used in
relational models.
• Primary Key: It is the identifier that makes a table unique. There are no null
values in it.
• Foreign Key: It refers to another table’s primary key. Only values that appear in
the primary key of the table to which it refers are allowed.
Examples
• Oracle: The Oracle Database is also known as Oracle RDBMS or simply Oracle.
Oracle Corporation produces and markets a multi-model database management
system. An Oracle database is a logical collection of data. It’s the first database
built specifically for enterprise grid computing, the most flexible and cost-
effective way to manage data and applications.
• MySQL: MySQL is a Relational Database Management System (RDBMS) based
on Structured Query Language that is free to use (SQL). MySQL is available on
almost every platform, including Linux, UNIX, and Windows.
• Microsoft SQL Server: In corporate IT environments, Microsoft SQL Server is
an RDBMS that supports a wide range of transaction processing, business
intelligence, and analytics applications.
• PostgreSQL: PostgreSQL, or simply Postgres, is an object-Relational Database
management system (ORDBMS) that focuses on extensibility and compliance
with industry standards.
• DB2: DB2 is an IBM database product. It’s a database management system for
relational databases (RDBMS) that is optimised for data storage, analysis, and
retrieval. With XML, the DB2 product now supports object-oriented features and
non-relational structures.
Owing to its many advantages, the relational model is the most commonly used
database model for business and financial databases. Relational database
terminology will be discussed in section 2.11.
2.9.3 3 Advantages
• Data can be accessed, inserted and/or deleted without changing the database
structure.
• The database structure can be easily customised for most types of data storage.
• Data does not need to be duplicated.
• Most users easily understand the structure.
• It is easy to search for and extract data from the database.
• Changes in the database structure have no impact on data access in the
relational model.
• Revising any information as tables with rows and columns makes it much easier
to comprehend.
• Unlike other models, the relational database model supports both data
independence and structure independence, making database design,
maintenance, administration, and usage much easier.
• You can use this to write complex queries to access or modify database data.
• In comparison to other models, it is easier to maintain security.
2.9.3.4 Disadvantages
• A disadvantage of using this model type is that it is slower than the network and
hierarchical models because it uses more processing power to query data.
25
• It’s difficult to map objects in a relational database.
• The relational model lacks an object-oriented paradigm.
• With relational databases, maintaining data integrity is difficult.
• The relational model is suitable for small databases but not for large databases
because they are not designed for change. Each row represents a unique entry,
and each column describes unique attributes, in relational databases. Data
modelling requires planning ahead of time and, depending on the system, can
take months or even years. After-the-fact changes take time and resources, and
database modelling projects can take years and cost millions of dollars. Because
big data is always changing, a flexible and forgiving database platform is
required.
• Hardware costs are incurred, making it expensive.
• The relational data model is not appropriate for all domains. Schema evolution is
difficult due to an inflexible data model. Poor horisontal scalability results in low
distributed availability. Due to joins, ACID transactions, and strict consistency
constraints, performance has suffered (especially in distributed environments).
This model is used for more specialised databases such as multimedia web-
based applications, molecular biology databases and defence industries. Object-oriented
database models are not as widely used as relational databases because they are
expensive to implement, and many organisations do not need to process data types
other than numerical and text data.
2.9.4.1 Advantages
• Object databases can store a variety of data types, whereas relational databases
store only one type of data. Object-oriented databases, unlike traditional
databases such as hierarchical, network, and relational databases, can handle a
variety of data types, including pictures, voice, video, text, and numbers.
• You can reuse code, model real-world scenarios, and improve reliability and
flexibility with object-oriented databases.
• Because most of the tasks within the system are encapsulated, they can be
reused and incorporated into new tasks.
• Object-oriented databases have lower maintenance costs than other models.
2.9.4.2 Disadvantages
These types of database models are relational models that have been tweaked to help
with analytical processing. This model is designed for online analytical processing,
while the relational model is optimised for online transaction processing (OLTP)
(OLAP).
This hybrid database model is a type of database model that combines the relational
model’s simplicity with some of the object-oriented database models’ advanced
functionality. It allows designers to incorporate objects into the common table structure.
27
SQL3, vendor languages, ODBC, JDBC, and proprietary call interfaces are all
extensions of the Relational Model’s languages and interfaces (UNISA 2022).
The entity relationship database model is a type of database model that is similar to the
network model, it captures relationships between real-world entities, but it isn’t as
closely linked to the database’s physical structure. It’s more commonly used to
conceptually design a database.
The people, places, and things about which data points are stored are referred to as
entities, and each of them has specific attributes that make up their domain. The
cardinality of entities, or the relationships between them, is also mapped.
The star schema is a common ER diagram that connects multiple dimensional tables
through a central fact table (UNISA 2022).
An inverted file structure database is another type of database model that are
designed to allow for quick full-text searches. The data content is indexed as a series of
keys in a lookup table, with the values pointing to the location of the associated files in
this model. For example, in Big Data and analytics, this structure can provide near-
instantaneous reporting.
Since 1970, this model has been used by Software AG’s ADABAS Database
management system, and it is still supported (UNISA 2022).
The flat models are the oldest and most basic types of data models. It simply lists all of
the information in a single table with columns and rows. The computer must read the
entire flat file into memory to access or manipulate the data, making this model
inefficient for all but the smallest data sets (UNISA 2022).
2.9.11 Context Model
As needed, elements from other types of database models can be incorporated into this
model. It combines aspects of the object-oriented, semi-structured, and network
models.
The associative models are database models that categorise all data points into two
categories: entities and associations. An entity is anything that exists independently in
this model, whereas an association is something that exists only because of something
else.
• A collection of items, each with its unique identifier, name, and classification.
• A collection of links, each with its unique identifier and the source, verb, and
target identifiers. Each of the three identifiers may refer to a link or an item, and
the stored fact is about the source.
• Information about how the stored data relates to the real world is included in the
semantic model.
• Named graph
• Triplestore (UNISA 2022)
All users interact with this single database in the single location through the computer
network. The benefit of using this type of database is that the database is always up to
date with the latest information if online input and real-time processing are used.
29
When using a distributed database, there are several interlinked
databases stored in several computers in the same (e.g., headquarters)
or different locations (e.g., branches) (UNISA 2022).
When a distributed database is properly managed, users will not know that each
person may be interacting with a database in a different location because they will all
have the same view of the database. Distributed databases are either a partitioned or a
replicated database.
A partitioned database is split into smaller portions (partitions) and the part applicable
to the user is made available on the location closest to the user. Partitioned databases
are generally used when minimal data sharing is necessary between users at the
different locations. For example, an organisation with branches may use a partitioned
database when its customers always only interact with that specific branch and there is
thus no need for the branches to view each other’s customer databases.
In a replicated database, the whole original database is copied to the different locations,
that is, the database is replicated at each location. For example, a pharmacy with
countrywide branches at which customers can obtain new and repeat prescriptions at any
of the branches may use a replicated database for customers. This will enable the
customer to obtain a repeat prescription at any of the pharmacy’s branches without the
branch needing to see the original prescription.
One of the big four audit firms uses replicated databases for its
electronic client audit files. A master database of the client audit files is
created and then replicated on each audit team member’s computer.
Each team member works on his or her own “replicated database” and
synchronises to the master copy at a frequency determined by the audit team leader.
(a) Ask your auditor friends or family members or the auditors at your organisation if they
use databases for their client audit files.
(b) Determine whether they use a centralised or a distributed database.
(c) Is the distributed database updated through duplication or synchronisation?
Go to Discussion Forum 2.1 and discuss your findings with your fellow students.
31
FIGURE 2.4: Simplistic database overview (UNISA 2022)
Refer to figure 2.5 below. Each of the files shown is only an extract – the real
transaction and master files contain many more data fields and data values.
The data value entered will vary from data field to data field. For example, a data
value can be a number, say, 5, or a name, say, Thabo.
A data field contains a data value and is the smallest unit of data that
can be accessed in a database (UNISA 2022).
A data field is like a cell in a spreadsheet. The data value contained in a data field will
differ from data record to data record.
Data fields can be compulsory (data must be entered into this field), optional (the field
may be left blank if no data is entered) or calculated (the data value is not entered but
automatically derived from a formula based on other data fields).
In figure 2.5, in the purchase transaction file, the data value, 4, is entered in the
quantity data field for record PN10031. The balance data fields in the supplier master
file are an example of a calculated data field.
(c) Attribute
However, an attribute can appear in more than one database file. Each attribute will
have a specific field length (number of characters that can be entered in the field) and a
specific data type (numbers, characters, dates, etc). The field length and the specific
data type particular to that attribute are described in the data dictionary.
In figure 2.5, in the purchase transaction file, the attribute labelled “VAT” will include
the VAT amount for each record. The “Credit limit” attribute in the supplier master file
will indicate the credit limit of each supplier and the “Inventory category” attribute’s
data type will be alphabetic characters only.
33
(d) Field name
Field names are unique and no column (attribute) can therefore have the same name
in a single database file; i.e., an attribute with the field name “supplier code” will only
appear once in the “supplier master file”. A field name can, however, appear in more
than one database file, i.e., an attribute with the field name “supplier code” can appear
in both the “supplier master file” and the “purchase transaction file”.
In figure 2.5, in the purchase transaction file, “Price per unit” is a field name.
“Minimum order qty” is a field name in the inventory master file.
Each file has a unique data field (known as the primary data field) that
can be used to uniquely identify each data record in a database file. A
primary data field is also known as a primary key (UNISA 2022).
In figure 2.5, the “supplier code” (e) data field is the primary data field in the “supplier
master file” and the “inventory number” data field (f) is the primary data field in the
“inventory master file”.
The combination of the invoice nr (a) and the line nr (b) fields in the purchase transaction
file together make a unique data field – that is, PN10029 (invoice nr) and 1 (line nr)
creates a primary key, namely PN100291.
When a primary data field of a database file is entered into another database
file to create a relation between the two database files, the primary data field
in the other database file is known as a foreign key (UNISA 2022).
A foreign key does not uniquely identify a record and may have duplicates in a
database file. The use of foreign keys prevents the duplication of data.
In figure 2.5, the “purchase transaction file” links (relates) to the “supplier master file”
through the use of the “supplier code”. The “supplier code” (e) data field is the primary
data field in the “supplier master file” as it uniquely identifies the supplier record but in
the “purchase transaction file” the “supplier code” (c) is the foreign key as it links the
two files through a primary data field. Note that there is more than one entry for
supplier code “FOR001” in the purchase transaction file.
The “purchase transaction file” links (relates) to the “inventory master” file through the
use of the “inventory number”. The “inventory number” field (f) is the primary data field
in the “inventory master file”, but in the “purchase transaction file”, the “inventory
number” (d) is known as the foreign key. The master files have been sorted in a
different order, but individual data records can still be found using the unique primary keys
and foreign keys.
Each database file contains related records – that is, the records in the database file
have a common theme. There are different types of database files, namely master
files, transaction files, reference files and history files.
• Master file
A master file contains data records of a relative permanent nature (i.e., they
do not change regularly) about the organisation’s resources and subjects (i.e.,
customers, suppliers, inventory, employees, etc) (UNISA 2022).
35
Some of the data in the master file is updated periodically by the transaction files. The
master file is the most important file in the database and is the authoritative source of
data. In figure 2.5, the supplier master file contains data records about all the
organisation’s suppliers and the data fields in the records are relatively permanent
(i.e., the name of the supplier and telephone number do not regularly change).
• Transaction file
A transaction file contains data records relating to the daily individual activities
of the organisation (e.g., the organisation’s sales). A transaction file changes
regularly as additional transactions are processed (UNISA 2022).
These transaction data records (i.e., the transaction file) are used to update or change
the master file. In figure 2.5, the purchase transaction file contains data records about
the organisation’s purchase transactions for June 2016, which may be used to update
the balance field in the supplier’s master file.
• Reference file
A reference file is a semi-permanent file containing data records referenced to
by the transaction file in order to complete a transaction (UNISA 2022).
• History file
A history file contains data records about transactions completed in the past (UNISA 2022)
The data records in the history files are derived from the transaction file and are used
in future queries and references. For example, the prior year purchase transactions are
moved from the purchase transaction file to the purchase history file at the end of the
financial year, during the year-end process.
Activity 2.2
(b) Note: This activity refers to databases and incorporates aspects of what was learnt in
study unit 1.
The business has three overseas suppliers and two local suppliers from which it
purchases inventory. Inventory must be ordered when the quantity on hand
reaches the minimum reorder level.
Required:
Identify examples of each of the processing methods listed below:
(a) Classifying
(b) Sorting
(c) Calculating
(d) Summarising
37
SAP001 SAPC +27 09 959-1234 ZAR 229866.02
GIG002 GIGAB Computers +00 1 213-1177 USD 1528599.85
• The database model type used should support the requirements of the organisation –
that is, a financial system might only require a relational database, but an
organisation that requires online analytical processing (OLAP) needs to use a
multidimensional model.
• The acquired DBMS and database should closely match the requirements of
the organisation.
• The DBMS and database should be able to evolve to meet future organisational
needs.
• The performance (i.e., reaction time) of the DBMS and database. How fast can
records be updated or queried in the database?
• The cost of the DBMS and database should be considered. Can the organisation
afford the DBMS and database?
• Different DBMS and databases will require different levels of specialised staff
skills. Are there specialised skills available in the organisation or can the
organisation acquire the skills required?
• The hardware needed to run the DBMS and database should be considered.
The organisation may need to acquire hardware if it is not already available. This
will have further cost implications.
• Can the DBMS and database be integrated with the rest of the organisation’s
information systems?
• The database size (amount of data the database can manage) must be adequate for
the organisation’s future data requirements and the database should easily be
expandable.
• The number of concurrent users (the number of users who can assess the
database at the same time) the DBMS and database can handle should be taken
into account.
• The DBMS and database vendor should be a reputable organisation and
financially stable because this vendor will need to provide future support for the
solution.
39
Reflect
Make a note that you must return to topic 1 once you have mastered
Pastel (topic 7) and consider the following:
2.13 Summary
In this study unit, we learnt about the database environment, the advantages and
disadvantages of using a database environment and the components of this
environment. We also gained an understanding of different database models,
centralised and distributed databases and the terminology used in a relational
database. We dealt with the factors to consider when choosing appropriate database
management software and a database. In the next study unit, we will investigate the
utilisation of databases in an organisation.
REFERENCES
Eisa, I., Salem, R. & Abdelkader, H. (2017). December. A fragmentation algorithm for
storage management in cloud database environment. In 2017 12th International
Conference on Computer Engineering and Systems (ICCES) (pp. 141-147). IEEE.
Mansouri, Y., Toosi, A.N. & Buyya, R. (2017). Data storage management in cloud
environments: Taxonomy, survey, and future directions. ACM Computing Surveys
(CSUR), 50(6), pp.1-51.
Pavlo, A., Angulo, G., Arulraj, J., Lin, H., Lin, J., Ma, L., Menon, P., Mowry, T.C.,
Perron, M., Quah, I. & Santurkar, S., Tomasic, A., Toor, S., Aken, D.V., Wang, Z., Wu,
Y., Xian, R. & Zhang,T. "Self-Driving Database Management Systems," in CIDR 2017,
Conference on Innovative Data Systems Research, 2017
Shrivastava, S. & Pateriya, R.K. (2017). Efficient storage management framework for
software defined cloud. International Journal of Internet Technology and Secured
Transactions, 7(4), pp.317-329.
University of South Africa. (2022). Study guide for Practical Accounting Data
Processing AIN2601. Pretoria.
Verma, Kumar & Dixit. (2016). Data synchronization in heterogeneous database
environment. In 2016 2nd International Conference on Contemporary Computing and
Informatics (IC3I) (pp. 536-541). IEEE.
Wong, W.K., Kao, B., Cheung, D.W.L., Li, R. & Yiu, S.M. (2014). June. Secure query
processing with data interoperability in a cloud database environment. In Proceedings
of the 2014 ACM SIGMOD international conference on Management of data (pp. 1395-
1406).
41