You are on page 1of 86

Database and Information

Systems
What is a File system?
• A file system is a technique of arranging the files
in a storage medium like a hard disk, pen drive,
DVD, etc. It helps you to organizes the data and
allows easy retrieval of files when they are
required.
• A file system enables you to handle the way of
reading and writing data to the storage medium. It
is directly installed into the computer with the
Operating systems such as Windows and Linux.
What is DBMS?
Database Management System (DBMS) is a
software for storing and retrieving user's data while
considering appropriate security measures. It
consists of a group of programs that manipulate the
database.
The DBMS accepts the request for data from an
application and instructs the DBMS engine to
provide the specific data.
In large systems, a DBMS helps users and other
third-party software to store and retrieve data.
KEY DIFFERENCES:

• A file system is a software that manages and


organizes the files in a storage medium, whereas
DBMS is a software application that is used for
accessing, creating, and managing databases.
• The file system doesn't have a crash recovery
mechanism on the other hand, DBMS provides a
crash recovery mechanism.
KEY DIFFERENCES
• Data inconsistency is higher in the file system. On
the contrary Data inconsistency is low in a
database management system.
• File system does not provide support for
complicated transactions, while in the DBMS
system, it is easy to implement complicated
transactions using SQL.
• File system does not offer concurrency control,
whereas DBMS provides a concurrency control.
Features of a File system

• It helps you to store data in a group of files.


• Files data are dependent on each other.
• C/C++ and COBOL languages were used to
design the files.
• Shared File System Support
• Fast File System Recovery.
Features of DBMS
• A user-accessible catalog of data
• Transaction support
• Concurrency control with Recovery services
• Authorization services
• The value of data is the same at all places.
• Offers support for data communication
• Independent utility services
• Allows multiple users to share a file at the same
time
File System DBMS
A file system is a software that manages DBMS or Database Management System
and organizes the files in a storage is a software application. It is used for
medium. It controls how data is stored and accessing, creating, and managing
retrieved. databases.
The file system provides the details of data DBMS gives an abstract view of data that
representation and storage of data. hides the details

Storing and retrieving of data can't be done DBMS is efficient to use as there are a
efficiently in a file system. wide variety of methods to store and
retrieve data.
It does not offer data recovery processes. There is a backup recovery for data in
DBMS.
The file system doesn't have a crash DBMS provides a crash recovery
recovery mechanism. mechanism
Protecting a file system is very difficult. DBMS offers good protection mechanism.

In a file management system, the The redundancy of data is low in the


redundancy of data is greater. DBMS system.
Data inconsistency is higher in the file Data inconsistency is low in a database
The file system offers lesser security. Database Management System offers
high security.
File System allows you to stores the Database Management System stores
data as isolated data files and entities. data as well as defined constraints
and interrelation.
Not provide support for complicated Easy to implement complicated
transactions. transactions.
The centralization process is hard in Centralization is easy to achieve in
File Management System. the DBMS system.
It doesn't offer backup and recovery DBMS system provides backup and
of data if it is lost. recovery of data even if it is lost.
There is no efficient query processing You can easily query data in a
in the file system. database using the SQL language.
These system doesn't offer DBMS system provides a
concurrency. concurrency facility.
Terminologies
• A database is a collection of related data.
• A database management system (DBMS) is a collection of
programs that enables users to create and maintain a
database.
(Or)
• It is essentially a collection of interrelated data and set of
programs to access this data.
• RDBMS: collection of table.
• Application program: - accesses the database by sending
queries or requests for data to the DBMS.
• Query – causes some data to be retrieved.
Database Schema

• A database schema is the skeleton structure that


represents the logical view of the entire database.
• It defines how the data is organized and how the
relations among them are associated.
• It formulates all the constraints that are to be applied on
the data.
DBMS Environment
• Hardware
– Client-server architecture
• Software
– dbms, os, network, application
• Data
– Schema, subschema, table, attribute
• People
– Data administrator & database administrator
– Database designer: logical & physical
– Application programmer
– End-user: naive & sophisticated
• Procedure
– Start, stop, log on, log off, back up, recovery
Advantages of DBMS
• Control redundancy
• Consistency
• Integrity
• Security
• Concurrency control
• Backup & recovery
• Data standard
• Data sharing & conflict control
• Productivity & accessibility
• Maintenance
Limitations of DBMS

• Complexity
• Size
• Cost
–Software
–Hardware
–Conversion
• Performance
• Vulnerability
DDL
• DDL is short name of Data Definition Language, which
deals with database schemas and descriptions, of how the data
should reside in the database.

CREATE - to create a database and its objects like (table,
index, views, store procedure, function, and triggers)

ALTER - alters the structure of the existing database

DROP - delete objects from the database

TRUNCATE - remove all records from a table, including all
spaces allocated for the records are removed
DML
• DML is short name of Data Manipulation Language which
deals with data manipulation and includes most common SQL
statements such SELECT, INSERT, UPDATE, DELETE, etc.,
and it is used to store, modify, retrieve, delete and update data
in a database.
• SELECT - retrieve data from a database
• INSERT - insert data into a table
• UPDATE - updates existing data within a table
• DELETE - Delete all records from a database table
Types of Data Models

There are mainly three different types of data


models: conceptual data models, logical data
models, and physical data models, and each one
has a specific purpose.
Conceptual Data Model: This Data Model
defines WHAT the system contains. This model
is typically created by Business stakeholders and
Data Architects.
Logical Data Model: Defines HOW the system
should be implemented regardless of the DBMS.
This model is typically created by Data
Architects and Business Analysts. The purpose
is to developed technical map of rules and data
structures.
Physical Data Model: This Data Model describes
HOW the system will be implemented using a
specific DBMS system. This model is typically
created by DBA and developers. The purpose is
actual implementation of the database.
Roles in the Database Environment

• Data and Database Administrators


• Database Designers
• Application Developers
• End-users
Data and Database Administrators

• Data and database administration are the roles


generally associated with the management and
control of a DBMS and its data.
• Data Administrator (DA) is responsible for
the management of the data resource,
including database planning; development and
maintenance of standards, policies and
procedures; and conceptual/logical database
design.
Database Administrator (DBA)

The Database Administrator (DBA) is


responsible for the physical realization of the
database, including physical database design and
implementation, security and integrity control,
maintenance of the operational system, and
ensuring satisfactory performance of the
applications for users.
Database Designers

• Logical database designers


• Physical database designers.
The logical database designer is concerned with
identifying the data (that is, the entities and
attributes), the relationships between the data,
and the constraints on the data that is to be
stored in the database.
The work of the logical database designer split
into two stages:
• Conceptual Database Design, which is
independent of implementation details, such as
the target DBMS, application programs,
programming languages, or any other physical
considerations.
• Logical Database Design, which targets a
specific data model, such as relational, network,
hierarchical, or object-oriented.
Physical database designer
Physical database designer decides how the
logical database design is to be physically
realized. This involves:
• mapping the logical database design into a set
of tables and integrity constraints;
• selecting specific storage structures and access
methods for the data to achieve good
performance;
• designing any security measures required on
the data.
Application Developers

• Once the database has been implemented, the


application programs that provide the required
functionality for the end-users must be
implemented. This is the responsibility of the
application developers.
• Each program contains statements that request
the DBMS to perform some operation on the
database, which includes retrieving data,
inserting, updating, and deleting data.
End-Users

The end-users are the “clients” of the database,


which has been designed and implemented and
is being maintained to serve their information
needs. End-users can be classified according to
the way they use the system:
1.Naïve users
2.Sophisticated users
Multi-user DBMS Architectures
• The common architectures that are used to
implement multi-user database management
systems:
1.Teleprocessing,
2.File-server, and
3.Client server
Teleprocessing
• The traditional architecture for multi-user systems
was teleprocessing, where there is one computer with
a single central processing unit (CPU) and a number
of terminals
• All processing is performed within the boundaries of
the same physical computer.
• User terminals are typically “dumb” ones, incapable
of functioning on their own, and cabled to the central
computer.
• The terminals send messages via the communications
control subsystem of the operating system to the
user’s application program, which in turn uses the
services of the DBMS.
In the same way, messages are routed back to the user’s
terminal.
Unfortunately, this architecture placed a tremendous
burden on the central computer, which had to not
only run the application programs and the DBMS, but
also carry out a significant amount of work on behalf
of the terminals
File-Server Architecture
A computer attached to a network with the primary purpose
of providing shared storage for computer files such as
documents, spreadsheets, images, and databases.
The file-server architecture, therefore, has three main
disadvantages:
(1)There is a large amount of network traffic.
(2)A full copy of the DBMS is required on each workstation.
(3)Concurrency, recovery, and integrity control are more
complex, because there can be multiple DBMSs accessing
the same files.
• In a file-server environment, the processing is
distributed about the network, typically a local area
network (LAN). The file-server holds the files
required by the applications and the DBMS.
However, the applications and the DBMS run on each
workstation, requesting files from the file-server
when necessary
• This approach can generate a significant amount of
network traffic, which can lead to performance
problems.
Two-Tier Client–Server
Architecture
• To overcome the disadvantages of the first two
approaches the client–server architecture was
developed.
• Client–server refers to the way in which software
components interact to form a system.
• As the name suggests, there is a client process,
process
which requires some resource, and a server,
server which
provides the resource.
• There is no requirement that the client and server
must reside on the same machine.
Three-Tier Client–Server
Architecture
N-Tier Architectures
Middleware
• Computer software that connects software
components or applications.
• Hurwitz (1998) defines six main types of
middleware
Asynchronous Remote Procedure
Call (RPC):
• An interprocess communication technology
that allows a client to request a service in
another address space without waiting for a
response.
AnRPC is initiated by the client sending a
request message to a known remote
server in order to execute a specified procedure
using supplied parameters.
Asynchronous Remote Procedure
Call (RPC):
• Advantage-scalability
• Disadvantage-low recoverability
Synchronous RPC
While the server is processing the call, the
client is blocked (it has to wait until the server
has finished processing before resuming
execution).
Advantage-high recoverability
Disadvantage- low scalability
Publish/subscribe
• An asynchronous messaging protocol where
subscribers subscribe to messages produced
by publishers.
• Messages can be categorized into classes
• and subscribers express interest in one or
more classes, and receive only messages
• that are of interest.
Publish/subscribe
• Advantages-greater scalability,dynamic
network topology.
• Egs-TIBCO Rendezvous from TIBCO Software
Inc. ,Ice (Internet Communications Engine)
from ZeroC Inc.
Message-oriented middleware
(MOM):
• Software that resides on both the client and
server and typically supports asynchronous calls
between the client and server applications.
• Message queues provide temporary storage
when the destination application is busy or not
connected.
• Egs- WebSphere MQ from IBM, MSMQ
(Microsoft Message Queuing), JMS (Java
Messaging Service)
Object-request broker (ORB):
• Manages communication and data exchange
between objects.
• ORBs promote interoperability of distributed
object systems by allowing developers to build
systems by integrating together objects,
possibly from different vendors, that
communicate with each other via the ORB.
• The Common Object Requesting Broker
Architecture (CORBA) is a that enables
software components written in multiple
computer languages and running on multiple
computers to work together.
• An example of a commercial ORB
middleware product is Orbix from Progress
Software.
SQL-oriented data access:
• Connects applications with databases across
the network and translates SQL requests into
the database’s native SQL or other database
language.
• More generally, database-oriented
middleware connects applications to any type
of database
• Examples- Microsoft’s ODBC (Open Database
Connectivity) API, JDBC API
Transaction Processing Monitors

A program(middleware component) that


controls data transfer between clients and
servers in order to provide a consistent
environment, particularly for online transaction
processing (OLTP).
Fig. The Transaction Processing Monitor as the middle
tier of a three-tier client–server
architecture.
Transaction Processing Monitors

• TP Monitors provide significant advantages,


including:
• Transaction routing: The TP Monitor can
increase scalability by directing transactions to
specific DBMSs.
• Managing distributed transactions: The TP
Monitor can manage transactions that require
access to data held in multiple, possibly
heterogeneous, DBMSs.
• Load balancing: The TP Monitor can balance
client requests across multiple DBMSs on one
or more computers by directing client service
calls to the least loaded server.
• In addition, it can dynamically bring in
additional DBMSs as required to provide the
necessary performance.
• Funneling: In environments with a large
number of users, it may sometimes be difficult
for all users to be logged on simultaneously to
the DBMS.
• Instead of each user connecting to the DBMS,
the TP Monitor can establish connections with
the DBMSs as and when required, and can
funnel user requests through these
connections.
• This allows a larger number of users to access
the available DBMSs with a potentially much
smaller number of connections, which in turn
would mean less resource usage.
• Increased reliability: The TP Monitor acts as a
transaction manager, performing the
necessary actions to maintain the consistency
of the database, with the DBMS acting as a
resource manager.
• If the DBMS fails, the TP Monitor may be able
to resubmit the transaction to another DBMS
or can hold the transaction until the DBMS
becomes available again.
Software Components of a DBMS
• The major software components in a DBMS
environment are
• Query processor. This is a major DBMS component that transforms
queries into a series of low-level instructions directed to the database
manager.
• Database manager (DM). The DM interfaces with user-submitted
application programs and queries. The DM accepts queries and
examines the external and conceptual schemas to determine what
conceptual records are required to satisfy the request. The DM then
places a call to the file manager to perform the request.
• File manager. The file manager manipulates the underlying storage
files and manages the allocation of storage space on disk. It establishes
and maintains the list of structures and indexes defined in the internal
schema. If hashed files are used, it calls on the hashing functions to
generate record addresses. However, the file manager does not
directly manage the physical input and output of data. Rather, it
passes the requests on to the appropriate access methods, which
either read data from or write data into the system buffer (or cache).
• DML preprocessor. This module converts DML statements
embedded in an application program into standard function calls
in the host language. The DML preprocessor must interact with
the query processor to generate the appropriate code.
• DDL compiler. The DDL compiler converts DDL statements into
a set of tables containing metadata. These tables are then stored
in the system catalog while control information is stored in data
file headers.
• Catalog manager. The catalog manager manages access to and
maintains the system catalog. The system catalog is accessed by
most DBMS components.
Software components for the database manager
The major software components for the database manager
are as follows:
• Authorization control. This module confirms whether the
user has the necessary authorization to carry out the
required operation.
• Command processor. Once the system has confirmed
that the user has authority to carry out the operation,
control is passed to the command processor.
• Integrity checker. For an operation that changes the
database, the integrity checker checks whether the
requested operation satisfies all necessary integrity
constraints (such as key constraints)
• Query optimizer. This module determines an optimal strategy for
the query execution.
• Transaction manager. This module performs the required
processing of operations that it receives from transactions.
• Scheduler. This module is responsible for ensuring that concurrent
operations on the database proceed without conflicting with one
another. It controls the relative order in which transaction operations
are executed.
• Recovery manager. This module ensures that the database remains
in a consistent state in the presence of failures. It is responsible for
transaction commit and abort.
• Buffer manager. This module is responsible for the transfer of data
between main memory and secondary storage, such as disk and tape.
The recovery manager and the buffer manager are sometimes
referred to collectively as the data manager. The buffer manager is
sometimes known as the cache manager.
Structured Query Language, or SQL
History
Over the last few years, SQL has become the standard relational
database language. In 1986, a standard for SQL was defined by
the American National Standards Institute (ANSI) and was
subsequently adopted in 1987 as an international standard by
the International Organization for Standardization (ISO, 1987).
More than one hundred DBMSs now support SQL, running on
various hardware platforms from PCs to mainframes
Introduction to SQL
Objectives of SQL
A database language should allow a user to:
• create the database and relation structures;
• perform basic data management tasks, such as
the insertion, modification, and deletion of data
from the relations;
• perform both simple and complex queries.
SQL is an example of a transform-oriented
language,
language or a language designed to use
relations to transform inputs into required
outputs.
• a Data Definition Language (DDL) for defining
the database structure and controlling access to
the data;
• a Data Manipulation Language (DML) for
retrieving and updating data.
• The command structure consists of standard English
words such as CREATE TABLE, INSERT, SELECT.
• For example: –
 -CREATE TABLE Staff (staffNo VARCHAR(5), IName
VARCHAR(15), salary DECIMAL(7,2));

 – INSERT INTO Staff VALUES (‘SG16’, ‘Brown’,


8300);

 – SELECT staffNo, IName, salary FROM Staff WHERE


salary > 10000;
• SQL can be used by a range of users including database
administrators (DBA), management personnel, application
developers, and many other types of end-user.
Writing SQL Commands
The structure of an SQL statement and the
notation use to define the format of the various
SQL constructs.
An SQL statement consists of reserved words
and user-defined words.
SQL is free-format, an SQL statement or set of
statements is more readable if indentation and
lineation are used.
For example:
• each clause in a statement should begin on a new
line;
• the beginning of each clause should line up with the
beginning of other clauses;
• if a clause has several parts, they should each appear
on a separate line and be indented under the start of
the clause to show the relationship.
The following extended form of the Backus Naur Form
(BNF) notation to define SQL statements:
• uppercase letters are used to represent reserved words and
must be spelled exactly as shown;
• lowercase letters are used to represent user-defined words;
• a vertical bar ( | ) indicates a choice among alternatives; for
example, a | b | c;
• curly braces indicate a required element; for example, {a};
• square brackets indicate an optional element; for example, [a];
• an ellipsis (. . .) is used to indicate optional repetition of an item
zero or more times. For example: {a|b} (, c . . .) means either a or
b followed by zero or more repetitions of c separated by commas.
Data Manipulation
The following SQL DML statements:
• SELECT – to query data in the database
• INSERT – to insert data into a table
• UPDATE – to update data in a table
• DELETE – to delete data from a table
SELECT
•The purpose of the SELECT statement is to
retrieve and display data from one or more
database tables.
•SELECT is the most frequently used SQL
command and has the following general form:
The SQL SELECT statement is used to fetch the data from a database
table which returns this data in the form of a result table. These result
tables are called result-sets.

Syntax
The basic syntax of the SELECT statement is as follows −
SELECT column1, column2, columnN FROM table_name;

Here, column1, column2... are the fields of a table whose values you
want to fetch. If you want to fetch all the fields available in the field, then
you can use the following syntax.
SELECT * FROM table_name;
SQL - WHERE Clause
• The SQL WHERE clause is used to specify a condition
while fetching the data from a single table or by
joining with multiple tables. If the given condition is
satisfied, then only it returns a specific value from
the table. You should use the WHERE clause to filter
the records and fetching only the necessary records.
• The WHERE clause is not only used in the SELECT
statement, but it is also used in the UPDATE, DELETE
statement, etc.,
Syntax
The basic syntax of the SELECT statement with the WHERE clause is
as shown below.

SELECT column1, column2, columnN FROM table_name WHERE


[condition]

Can specify a condition using the comparison or logical operators like


>, <, =, LIKE, NOT, etc.
AND Operator
The AND operator allows the existence of multiple conditions in an SQL
statement's WHERE clause.
Syntax
SELECT column1, column2, columnN FROM table_name
WHERE [condition1] AND [condition2]...AND
[conditionN];
Sorting result using ORDERBY
clause
The SQL ORDER BY clause is used to sort the data in ascending
or descending order, based on one or more columns. Some
databases sort the query results in an ascending order by default.
Syntax
The basic syntax of the ORDER BY clause which would be used
to sort the result in an ascending or descending order is as follows

SELECT column-list FROM table_name [WHERE condition]
[ORDER BY column1, column2, .. columnN] [ASC | DESC];
Aggregation and Grouping
Operations
Using the SQL Aggregate Functions
As well as retrieving rows and columns from the database, we often
want to perform some form of summation or aggregation of data,
similar to the totals at the bottom of a report. The ISO standard
defines five aggregate functions:
• COUNT – returns the number of values in a specified column
• SUM – returns the sum of the values in a specified column
• AVG – returns the average of the values in a specified column
• MIN – returns the smallest value in a specified column
• MAX – returns the largest value in a specified column
Assignment
Write queries for above functions?
Grouping results using GROUPBY clause
and HAVING clause.
The SQL GROUP BY clause is used in collaboration with the
SELECT statement to arrange identical data into groups. This GROUP
BY clause follows the WHERE clause in a SELECT statement and
precedes the ORDER BY clause.
Syntax
SELECT column1, column2 FROM table_name WHERE
[ conditions ] GROUP BY column1, column2 ORDER BY
column1, column2
Refer website:
•https://www.w3schools.com/sql/sql_groupby.a
sp
•https://www.geeksforgeeks.org/sql-group-by/
• Do exercise
Find the number of staff working in each branch
and the sum of their salaries.
SELECT branchNo, COUNT(staffNo) AS
myCount, SUM(salary) AS mySum
FROM Staff
GROUP BY branchNo
ORDER BY branchNo;
SQL - Having Clause
• The HAVING Clause enables you to specify conditions that filter
which group results appear in the results.
• The WHERE clause places conditions on the selected columns,
whereas the HAVING clause places conditions on groups created by
the GROUP BY clause.
Syntax

SELECT FROM WHERE GROUP BY HAVING ORDER BY

The following code block has the syntax of the SELECT statement
including the HAVING clause −

SELECT column1, column2 FROM table1, table2 WHERE


[ conditions ] GROUP BY column1, column2 HAVING
[ conditions ] ORDER BY column1, column2
Refer website:

•https://www.w3schools.com/sql/sql_having.as
p
•https://www.guru99.com/group-by.html
•https://www.studytonight.com/dbms/having-cl
ause.php
Difference between Where and
Having Clause in SQL
SR.NO
WHERE Clause HAVING Clause
.
WHERE Clause is used to filter the records
HAVING Clause is used to filter record from the groups
1. from the table based on the specified
based on the specified condition.
condition.

WHERE Clause can be used without GROUP BY HAVING Clause cannot be used without GROUP BY
2.
Clause Clause

3. WHERE Clause implements in row operations HAVING Clause implements in column operation

WHERE Clause cannot contain aggregate


4. HAVING Clause can contain aggregate function
function

WHERE Clause can be used with SELECT, HAVING Clause can only be used with SELECT
5.
UPDATE, DELETE statement. statement.

WHERE Clause is used before GROUP BY


6. HAVING Clause is used after GROUP BY Clause
Clause

WHERE Clause is used with single row function HAVING Clause is used with multiple row function like
7.
like UPPER, LOWER etc. SUM, COUNT etc.

You might also like