You are on page 1of 6

1.

8 Database and data modeling

Show understanding of the limitations of using a file‐based approach for the storage and
retrieval of large volumes of data

Information is vital to organizations. Often one of the most valuable resources in a business is
its accumulated information. The problem is the storage, retrieval and manipulation of all this
information.

The way in which computers manage information has come a long way over the last few
Bikash Agrawal
decades. Today’s users take for granted the many benefits found in a database system.
However, it wasn’t that long ago that, computers relied on a much less elegant and costly
approach to data management called the file‐based system.

• File‐based approach

One way to keep information on a computer is to store it in permanent files. A company system
has a number of application programs; each of them is designed to manipulate data files. An
organization’s data was duplicated in separate files for the use of individual departments. for
example the personnel department would hold details on name, address, qualifications etc. of
each employee, while the payroll department would hold details of name, address and salary of
each employee. Each department had its own set of application programs to process the data in
these files. The system just described is called the file‐based system.

Consider a traditional banking system that uses the file‐based system to manage the
organization’s data shown in Figure 1.1. As we can see, there are different departments in the
bank. Each has its own applications that manage and manipulate different data files. For
banking systems, the programs may be used to debit or credit an account, find the balance of
an account, add a new mortgage loan and generate monthly statements.

Figure 1.1. Example of a file‐based system used by banks to manage data.

By: Bikash Agrawal


• Disadvantages of the file‐based approach

Using the file‐based system to keep organizational information has a number of disadvantages.
Listed below are five examples.

Data redundancy and inconsistency


Often, within an organization, files and applications are created by different programmers from
various departments over long periods of time. This can lead to data redundancy and
inconsistency.

For example, a customer can have a savings account as well as a mortgage loan. Here, the
customer details may be duplicated since the programs for the two functions store their
corresponding data in two different data files. This gives rise to redundancy in the customer's
data. Since the same data is stored in two files, inconsistency arises if a change made in the
data of one file is not reflected in the other.

Data isolation
Data isolation is a property that determines when and how changes made by one operation
become visible to other concurrent users and systems. This issue occurs in a concurrency
situation. This is a problem because:

• It is difficult for new applications to retrieve the appropriate data, which might be
stored in various files.

Integrity problems
Problems with data integrity is another disadvantage of using a file‐based system. It refers to
the maintenance and assurance that the data in a database are correct and consistent. Factors
to consider when addressing this issue are:

• Data values must satisfy certain consistency constraints that are specified in the
application programs.
• It is difficult to make changes to the application programs in order to enforce new
constraints.

For example, In the savings bank application, one such integrity rule could be 'Customer ID,
which is the unique identifier for a customer record, should not be empty'. There can be several
such integrity rules. In a file‐based system, all these rules need to be explicitly programmed in
the application program.

Security problems
Security can be a problem with a file‐based approach because:

• There are constraints regarding accessing privileges.

By: Bikash Agrawal


• Application requirements are added to the system in an ad‐hoc manner so it is difficult
to enforce constraints.

For example, in a banking system, payroll personnel need to view only that part of the database
that has information about the various bank employees. They do not need access to
information about customer accounts. Since application programs are added to the system in
an ad‐hoc manner, it is difficult to enforce such security constraints.

Concurrency access
Concurrency is the ability of the database to allow multiple users access to the same record
without adversely affecting transaction processing. A file‐based system must manage, or
Bikash Agrawal
prevent, concurrency by the application programs. Typically, in a file‐based system, when an
application opens a file, that file is locked. This means that no one else has access to the file at
the same time.

In database systems, concurrency is managed thus allowing multiple users access to the same
record. This is an important difference between database and file‐based systems.

Describe the features of a relational database which address the limitations of a file‐based
Approach

The solution of many of these problems with using flat files was the arrival of relational
database system.

The data are stored in tables which have relationships between the various tables. Each table
stores data about an entity‐i.e. some “thing” about which data are stored, for example, a
customer or a product. Each table has a primary key field, by which all the values in that table
are identified. The table can be viewed just like a spreadsheet grid, so one row in the table is
one record.

The practical design of relational databases is based in the theory developed in the late 1970s
by Ted Codd. The theory called the entities relations and they are implemented as tables. Each
record in the table is called a “tuple” (also known as a row). A data item is known as an
attribute (or a column).

The records in the table can be related to entities in other tables by having common fields
within the entities.

So, the problem of duplication of customer details in saving account and mortgage loan can be
solved by using relevant field in the mortgage loan table simply containing the key of saving
account. The likely data design here would be:
• The saving account has a primary key AccHolderID.
• The mortgage loan also has the AccHolderID
• The AccHolderID field in the mortgage loan table is foreign key.

By: Bikash Agrawal


Here differing needs of the departments are met by the software that is used to control the
data. As all the data are stored somewhere in the system, a department only needs software
that can search for it. In this each department does not need its own set of data, simply its own
view of the centralized database to which all users have access.

Advantages of RDBMS over flat file approach

9 Data are contained in a single software applications‐the relational database


software.
9 Duplication of data is minimized and so the chance of data inconsistency is reduced
9 As long as there is a link to the table sorting the data, they can always be accessed
via the link rather than repeating the data.
9 Because data duplication is minimized, the volume of data is reduced, leading to
faster searching and sorting of data.

Show understanding of the features provided by a DBMS to address the issues of:

• data management, including maintaining a data dictionary


The data dictionary contains information about the actual database itself. This data enables the
DBA to keep a tight control over all aspects of the database and facilities maintenance. The
dictionary could contain:
9 The detailed description of each data item
9 The relationships between data items
9 Access rights for users and groups
Bikash Agrawal
9 Validation rules
9 The map between the logical and the physical view for storage purposes.
9 Data recovery procedures.
9 A transaction log to monitor the users, programs and data.

• data modeling
When large databases are designed, it has become common practice to use diagrams. The re
are often referred to as data models. Each model has a number of key elements.
9 Entities
9 Attributes
9 Relationships: these provide the links between the entities. For example:

Country has Cities

Students take Courses

has Capital
Country

By: Bikash Agrawal


Entity relationship diagrams (ER diagrams)
These diagrams are graphical representations of the structure of data. RE diagrams allow the
analyst to think about and model general relationships. There are three possible relationships
linking the entities.

One‐to‐one relationship

Country has Capital

Each country has one capital city

One‐to‐many relationship

Country has Cities

Each country has many cities

Many‐to‐Many relationship

Students take Courses

Each student takes many courses, and same course is taken by many students

• logical schema
Bikash Agrawal
Databases are characterized by a three‐schema architecture because there are three different
ways to look at them. Each schema is important to different groups in an organization. The
graphic below illustrates this architecture and the groups most involved with each schema.

Logical Schema
A database’s logical schema is its overall logical plan. This schema is developed with diagrams
that define the content of database tables and describe how the tables are linked together for

By: Bikash Agrawal


data access. Database designers are responsible for creating the logical schema. Application
developers and database administrators may find the logical schema useful for performing
certain tasks.

By: Bikash Agrawal

You might also like